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'C3x Block Repeat 

Contributed by Alex Tessarob 



Design Problem The setup time for a repeat block is four cycles. How can I move most of the setup to 
an initialization phase and reduce the overhead during algorithm execution? 

Solution The repeat block requires that: ( 1 ) the RM bit in the status register be set, and 

(2) the RE, RS, and RC registers be loaded. All but the RM can be pre-initialized. 
During program execution, the RM bit can be set with an OR instruction. 

Figure 1 shows examples for comparison. The algorithm in Figure l.b, which con- 
tains the pre-initialized RE and RS registers, will execute faster. If executed repeat- 
edly, the cycle savings could be significant. 



a. Standard RPTB Initialization 


b. Faster RPTB Execution 

; initialize pointers to RS and RE 
.data 

RPTBEndAddr .word RPTBEnd 
RPTBStartAddr .word RPTBStart 
; initialize RE and RS 
.text 

LDP RPTBEndAddr 
LDI @RPTBEndAddr , RE 
LDP RPTBStartAddr 
LDI @RPTBStartAddr,RS 


.text 




.text 




LDI N-1,RC 
RPTB InnerLoop ( 1 ) 
; first loop inst 


RPTBStart 


LDI 
OR 


N-1,RC 

0100, ST ;RM =1 (2) 
; first loop inst 


InnerLoop: . ; last loop inst 


RPTBEnd 




; last inst 


Note 1: RPTB InnerLoop is a 4-cycle instruction. 
Note 2: OR OlOOh.ST is a 1-cycle instruction. 









Figure 1. Comparison of RPTB algorithms 
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Avoiding False Interrupts on the 'C3x 

Conmbued by Randy Preskia 



Design Problem TMS320C30 interrupts are internally latched on the falling edge of HI (see pp. 6-20 
and 13-38 of the TMS320C3x User's Guide). If the interrupt is held low for three or 
more HI cycles, multiple interrupts may occur. 

Solution The solution is to add a PAL clocked by HI which intercepts the interrupt. This 

l ogic w ill hold the interrupt (INTx' ) low for two HI cylces. The external interrupt 
(INTx) may be held low longer or shorter, but must go high before the interrupt can 
be reasserted. For four external interrupts, the same logic may be repeated in the 
same PAL. 




Figure 1. State diagram 



(continued on next page) 
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Figure 2. Karnaugh map 




Figure 3. Logic diagram 
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reversed Addressing Without Data Alignment on the 'C3x 



Contributed by Tim Grady 



Solution 



Bit-reversed addressing mode requires that the n-element array be aligned on an n- 
word boundary. When n is large, this may result in a large "hole" in the memory 
map. To use memory more efficiently, a technique to use bit-reversed addressing 
without data alignment is required. 

AR2 points to the data. 

AR1 is initialized to and becomes an offset into the array. Bit-reversed address- 
ing mode is used to modify AR1. Figure 1 shows an assembly language version. 
Figure 2 shows a C version which uses in-line assembly to permit bit-reversed 
addressing. 



AR2 



AR2+AR1 





Data 




Array 



AR1 



* Value of AR1 varies from 
Oto n-1 in bit-reversed order 



Figure 1. Solution diagram 



(continued on next page) 
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table 
taddr 



_mam 



.data 
.word 
.word 
.text 
. global 
ldp 



8,9,10,11,12,13,14,15 
table 



_main 
taddr 
ldi @taddr,ar2 
ldi 4,ir0 
ldi 0,arl 
ldi 7,rc 
rptb endloop 
ldi arl , irl 



pointer to array 

1/2 array size for bit-rev addressing 
first adress in bit-rev list 



endloop ldi *+ar2 (irl) ,r0 



put new offset into index register 
This inst may also be put in parallel if 
the right application comes along. 
rO holds array elements one at at time 
so that results can be observed 
*arl++ (irO)B, r7 ; calculate next address in parallel 

; r7 is a dummy variable to allow paral ops 



rets 



Figure 2. Assembly code 



int x[15]= {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; 

int *y=(int *)&x; 

int m; 

main( ) 

{ 



int i; 
y += 7; 
asm( " 
asm ( ■ 
asm( " 



/* start with non-aligned array element */ 

ldi @_y,ar0°); /* arO points to array */ 

ldi 0,ar2 n ); /* index for bit-rev */ 

ldi 4,ir0"); /* set up for bit-rev */ 



for(i=0;i<8;i++) 
{ 

asm ( " ldi 

asm ( " ldi 

asm(" | | ldi 

asm(" sti 

} . 



ar2 , irl " ) ,- 
*+ar0(irl) ,r7") ,- 
*ar2++(ir0)b,r6") ; 
r7 , @_m" ) ; 



/* load index of array */ 

/* traverse */ 

/* array with */ 

/* bit-rev offset */ 



Figure 3. Ccode 
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Optimizing Control Algorithms on 'C5x 

Contributed by Alex Tessarolo 



Design Problem 



Solution 



In many control algorithms, a value resides in the 32-bit accumulator that must be 
either stored to a 16-bit memory location or to a peripheral device which may be less 
than or equal to 16 bits in resolution, i.e., 8-bit A/D converter. Prior to storage, a 
range check must be performed on the sign bit (S) and the guard bits (G) in the 
accumulator. For positive numbers within range of the desired value, S = G = and 
for negative numbers, S = G = 1. If this is not the case, then overflow has occurred 
and the value stored must be saturated. 



ACC(+max/-max) 

NewACC(+max/-max) 



31 


Sign Bit 







G ... 6 


S 


Desired Value 


Don't Care 



Value Stored (16 bits) 



Figure 1. 

Standard published code to perform this saturation can take up to 15 cycles. How 
do you minimize this overhead. 

A technique for doing this operation which requires a minimum number of cycles is 
described below: 

1. Calculate the difference (Diff) between the ACC positive maximum value 
[ACC(+max)] and the desired positive maximum value [NewACC(+max)]: 

Diff = ACC(+max) - NewACC(+max) 

2. Make sure saturation mode is on (SOVM). 
(Continued on next page) 
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3. Execute the following instructions: 

ADDH Diff ;Step 1 
SUBH Diff ;Step 2 
SUBH Diff ;Step 3 

ADDH Diff ;Step 4, Value to be stored is either 

saturated or unchanged if 
within 



The above operation is shown next. 



Example 1 (value outside range) 



ACC(+max) 
NewACC(+max) 



Zero 



I 



NewACC(-min) 
ACC(-min) 



Example 2 (value within range) 



Figure 2. 
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TMS320C30 Addressing up to 68 Gigawords 

Contributed by Randy Restle 



Design Problem The primary bus has 24 address lines which allow addressing up to 16 megawords of 
memory. The expansion bus has 13 address lines addressing 8 Kwords. How can they 
be used together to address a larger memory space? 

Solution This technique uses the expansion bus address lines [XA( 12-0)] simultaneously with 
the primary address lines [A(23-0)], to extend the address to 36 bits. The feature 
that is used is a power-saving feature of the 'C3x family that holds the past address 
bits on an external bus until a new external access occurs (i.e., the A-Bus works as a 
latch). The following parallel instruction accomplishes this task: 



STI 



Rx, *ARn 



address MSTRB while loading a 
value from STRB memory 



LDI *ARp,Rq 



where: 

Rx and Rq designate registers R0 to R7 (but not the same register) 
ARn and ARp designate auxiliary registers ARO to AR7 (but not the 



Note: ARn contains the 8-megaword segment address plus 800000h. ARp contains 
the address within the 8-megaword segment and is between and 7FFFFFh. 



A(23) 
A(22:0) 



■C30 



STRB 



MSTRB 



XA(12:0) 



23 



no connect 
y 



O 



13 



A(22:0) 






Memory 


CS 


Array 


A(12:0) 





Figure 1. Solution diagram 
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'C5x EVM Provides for Audio Processing 



Contributed fry Joe George 

Design Problem 
Solution 



How do I interface a 'C5x to analog signals? 

The TMS320C5x EVM, with its input/output RCA jacks and TLC32046 Analog 
Interface Chip (AIC), offers an easy-to-configure system for audio processing. 
Whether implementing simple delays and echoes or complex multi-filter digital 
reverberation, 'C5x features make it easy. Some simple artificial digital reverberation 
code with absolute delay that simulates four echoes in a room is available on the 
TMS320 BBS in the file C5XEVAUD.EXE. 

The code first initializes the AIC and then sets up the 'C5x's two circular buffer 
pointers to point at different places in a single 32K word buffer (see Figure 1). This 
configuration corresponds to a receive pointer and a transmit pointer. In the serial 
port receive interrupt service routine (ISR) (Example 1), the 'C5x receives and stores 
data (at sample n) in the single buffer, but manipulates and transmits data stored 
earlier (n-absdelay) in the buffer. This permits a programmable absolute delay, 
which makes any processing effects more obvious. The manipulation in the ISR shows 
the scaling of four echoes at 1 (at n-absdelay), 1/2 (at n-absdelay-l*INnx), 
1/4 (at n-absdelay-2*INDX), and 1/8 (at n-absdelay-3*INDX). The echoes 
are then summed. 




Transmit Pointer 
n-absdelay 



n-absdelay-INDX 




COOOh 



Figure 1. 32K-word V5x circular buffer for reverberation 
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The delay between the echoes is previously stored in the index register. This 
index may be varied in code to hear the effects of various inter-echo delays. To 
prevent the circular addressing from indexing an invalid address, write a 1 to the 
most significant bit (MSB) of the indirect address since the buffer from 8000h to 
FFFFh is always 1. This is done using the Parallel Logic Unit. A NOP is inserted 
for pipeline considerations as explained in Section 3.6.2 in the TMS320C5x 
User's Guide. Be sure to mask off the lowest two bits as zeroes at the end of the 
ISR since they are command bits for the AIC. 





Example 1: 

; SERIAL PORT RECEIVE INTERRUPT SERVICE ROUTINE 

receive : 

MAR *,AR7 

LACC DRR 

SACL *+ 

MAR *,AR6 
SAR AR6, 60h 

LACC *0-, 16 



Receive Pointer 
Read sample from AIC 
Store to buffer (at n) 

Transmit Pointer 
Store Temp 
Load previous value 
(at n-absdelay) 
(Divide by 1) 



OPL #8000h, AR6 

NOP 

ADD *0-, 15 



OPL #8000h, AR6 

NOP 

ADD *0-, 14 



; Write 1 for addressing 

; Add previous value 
; (at n-absdelay- INDX) 
; (Divide by 1/2) 
; Write 1 for addressing 

; Add previous value 

; (at n-absdelay-2*INDX) 

; (Divide by 1/4) 



OPL 


#8000h, AR6 




NOP 






ADD 


*0-, 13 


Add previous value 
(n-absdelay-3 *INDX) 


AND 


#0FFFCh, 16 


Mask off lowest 2 bits 


SACH 


DXR 


Transmit to AIC 


LAR 


AR6, 60h 


Load back temp 


MAR 


*+, AR6 


Increment by one sample 


RETE 







Figure 2. 
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Circular Buffer in Second Generation DSPs 

Contributed by Randy Restk 



Design Problem 



Third and fourth generation DSPs (TMS320C3x/4x) include a circular buffer ad- 
dressing mode. The second generation DSPs (TMS320C2x) do not. 

The integer TMS320 devices make the use of a circular buffer unnecessary be- 
cause they can perform data movement simultaneously with arithmetic processing, 
with no penalty to code size or execution time. This is very efficient because it cir- 
cumvents the overhead of maintaining a buffer pointer. However, some applications 
still benefit from circular buffers. An example is a decimation filter because multiple 
data values must be skipped. In this case, it is usually more efficient to add an offset 
to a pointer rather than perform multiple data movements. 

The TMS320C25 can manipulate circular buffer pointers without penalty to code 
size or execution time. This is done by using its integral bit-reversed addressing capa- 
bility normally used in FFT solutions. In this mode, carries from each bit of the addi- 
tion of ARO and the current auxiliary register are propagated to the right instead of 
the left. The carry from the rightmost bit is ignored, effectively performing a modulo 
N addition, where N is the size of the buffer. N must be restricted to a power of 2. 

Figure 1 shows the order in which data values will be stored and their correspond- 
ing binary addresses for a buffer of size 8. ARO must be loaded with the size of the buffer 



Order — 


8 

X 11 V 


1 

. — *(ooo) 


2 

v 100 > 




7 y 








\ 3 


Cm u 


6 




4 


7oio) 




Yioi^) 


5 


Cm 


' Index 






'~~~-^ooiV^' 





Figure 1. Data order for bit-reversed circular buffer 
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divided by 2, and the coefficients which are to be multiplied by the circular- 
buffered values must be stored in a corresponding bit-reversed fashion. 

Traversing the data in bit-reversed order does operate on every data point — 
just not in linear order. For buffers where order is not important, but efficiency is 
important, this method works well. 
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Bit-reversed Addressing in C on the 'C3x 



Contributed by Tim Grady 



Design Problem Suppose a C programmer wanted to take advantage of bit-reversed addressing. The 
C compiler does not support it. How does the programmer embed assembly language 
statements into the C code to do this? 



Solution 



An example of how to do this is shown in Figure 1. 



#define N 16 

int x[N] = { 0,8,4,12,2,10,6,14,1,9,5,13,3,11,7,15 } ; 
int y[N] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 } ; 
/* int bitrev(int m, int n) ,- */ 

void mainO 
{ 



int i; 



asm ( * 
asm ( * 
asm ( * 
asm ( " 
asm ( * 



PUSH 

PUSH 

LDI 

LDI 

LDI 



for ( i=0; i<N; i++ ) 
{ 

/* y[bitrev(i,N) ] 
asm ( " LDI 
asm(" STI 



AR5 " ) ; 
AR0 " ) i 
8,IR0 

@CONST+0,AR5 
@CONST+l,AR0 



= x[i],- */ 
*AR5++(IR0)b, 
R0, *AR0++") ; 



initialize irO to 1/2 n") 
AR5 <- address of x[] ") 
AR0 <- address of y[] ") 



R0") ; 



} 



asm( 1 



pop arO " ) ; 
pop ar5") ; 

/* These statements place x and y in .bss and make their 
available via the CONST table. */ 



asm(" 
asm ( " 
asm ( " 
asm ( " 
asm ( " 



.bss 
. sect 
.word 
.word 
.word 



CONST, 2 
\" .cinit\" 
2 , CONST 



Figure 1. Solution example 
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Sharing Header Files in C and Assembly 

Contributed by Alan Davis 



Design Problem 



Sometimes it is useful to be able to define named constants that can be used in both 
C and assembly language. 

One method is to have separate header files that define the same symbols: a C 
include file with #def ine directives, and an assembler include file with . set or 
.asg directives. But can you have a single, shared header file that defines the 
symbols once for both C and assembler? 

The file shown in Figure 1 can be used normally as a C include file (ASMDEFS not 
defined). It can also be used to generate an assembler include file: compile with 
ASMDEFS defined and use -k to keep the output: 



cl30 -. 



-k defs.h 



#define PI 3.14 
#define E 2.72 

#ifdef ASMDEFS /* IF DEFINED, CREATE .asg DIRECTIVES */ 
♦define ASM_ASG ( sym) asm("\t.asg\t" VAL(sym) "," #sym) 
♦define VAL(sym) #sym 
ASM_ASG(PI) ; 
ASM_ASG(E) f 
#endif /* ASMDEFS*/ 





The output is the file def s . asm, which contains . asg directives for your symbols. 
See Figure 2. 




Figure 2. Output file def s . asm 



You can then . include this file in your assembly modules. The same technique 
can be used to create . set directives rather than . asg. 
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Here's how it works: The ASM_ASG macros in defs .h expand to asm state- 
ments containing the . asg directives. The trick is in generating both the 
name and the value of the argument symbol. ASM_ASG accomplishes this 
with ANSI C's new stringize operator, #. The last expression in ASM_ASG's 
definition, #sym, simply makes a string out of the argument without expanding 
it. Thus, #PI becomes "PI". The second expression in ASM_ASG's definition 
calls another macro, VAL, which, in turn, stringizes its argument. But in passing 
sym to VAL, PI is expanded (to 3.14), so VAL returns "3. 14". The result: 

asm("\t.asg\t" "3.14" "," "PI") 

In ANSI C, adjacent strings are concatenated, so this compiles down to a 
simple . asg directive in defs . asm. 
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Initializing the Fixed-Point EVM's AIC 

Contributed by Jason Chyan 



Design Problem 



How do I program the AIC registers for a given sampling rate, f s> and low-pass filter 
cutoff frequency, f c ? 

There are two pairs of registers, TA, TB and RA, RB. The T and R mean transmit 
and receive. Both pairs work the same way, so only one pair will be discussed here. 
The TA and TB registers can be written to via the DSP's serial port. The word sent 
to the AIC must have the two LSBs of the data word programmed to indicate that a 
control word is present. Typically, these two bits are 1 1. After receive a data word 
with two LSBs programmed as 1 1, the AIC will send another FSX signal after four 
shift clocks delayed to request the DSP to send the control word. The two LSBs of 
the control word will be programmed as 00 to indicate to program the TA and RA 
registers, and as 10 to program the TB and RB registers. 

A second register, TA' may also be programmed. The two LSBs for the control 
word to program the TA' and RA' registers are 01. The TA' register will cause a 
small change in the sampling frequency. The two LSBs of the data word are again 
used to program the use of the TA' register. TA+TA' is programmed as 01 while 
TA-TA' is programmed as 10. 

There are three equations you can use to determine f s and f c . 

f c = f m /(72*TA) j given a masterclock f m = 10.368 MHz 

f s = (36/TB) * f c ; TA' not used, LSBs = 00 

f s = (36*f c *f m )/(TB*f m +36*f c *TA') 





Table 1 TA a 


id TB vs. f c and f s 




TA 


fc(KHz) 


TB 


fA 


31 


4.6 


63 


0.57 


29 


5.0 


36 


1.0 


24 


6.0 


18 


2.0 


21 


6.8 


12 


3.0 


18 


8.0 


9 


4.0 


16 


9.0 


6 


6.0 


14 


10.3 






9 


16.0 






6 


24.0 







The following examples illustrate the use of Table 1. 
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Suppose f s = 16 KHz and f c = 8 KHz are desired. 



f c = 10368/(72*18) =8 ; TA = 18 
f s = (36/18) *8 =16 ; TB = 18 

If TA' = 20 is used, the following calculation results: 

f s = (36*8*10368)/ (18*10368+36*20) = 15.756 

Clearly, TA' reduced f s , but not much. It is used in modem applications to ad- 
vance or i 



; TA = 9 
TB = 36 

. TA = 24 
; TB = 12 



Other examples: 

a. f c = 16 KHz 
fs = 16 KHz 



b. f c 
fs 



6 KHz 
18 KHz 



Some other caveats include: 

1. f c min = 4.6 KHz 

2. fsmin = 2.622 KHz 

3. f c max = 28.8 KHz 

4. fsinax = 25 KHz 



where TA = 31 

where TB = 63 and TA = 31 

where TA = 5 (min allowed value) 

the maximum conversion rate for AIC 





.nmregs 








.global 


START, AICINIT, AIC_2ND 




.data 






TA 


.word 


18 


f c = 8 KHz 


RA 


.word 


18 ; 


f c = 8 KHz 


TAp 


.word 


31 




RAp 


.word 


31 




TB 


.word 


18 


f . = 2 * f c 


RB 


.word 


18 




AIC_CTR 


.word 


8Dh 




ACC_lo 


.word 







ACC_hi 


.word 







TEMP 


.word 







* initialization 








. text 






START: 


DINT 




disable interrupts 




LDPK 


#0 ; 


data page pointer == 




LARP 


; 


point to AR0 




CALL 


AICINT 


initialize AIC and enable ints 


* put main program 


here 






LACK 


#010h 


use RINT as sync for 




SACL 


IMR ; 


TX and RX 


AICINIT: 










SFSM 




non-continuous mode 




RTXM 




FSX as input 




FORT 





16 -bit words 




LALK 


#0ffefh ; 


Pulse AIC reset by setting it low 




SACL 


TEMP, 






OUT 


TEMP, PA2 ! 


Write to AIC 




RPTK 


#255 


and then taking it high after 10k cycles 




NOP 




( .5ms at lOOnS) 




RPTK 


#243 






NOP 







Figure 1. TMS320C25 code example 
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LALK 


#0FFFFh 




SACL 


TEMP, 




OUT 


TEMP, PA2 




LDPK 







LACK 


02 Oh 




SSXM 






SACL 


IMR ; 


XINT interrupt 


LAC 


TA, 9 ; 


initialize TA register 


ADD 


RA, 2 




CALL 


AIC_2ND 




LAC 


TAp,9 


initialize TA' 


ADD 


RAp, 2 




ADDK 


Olh 




CALL 


AIC_2ND 




LAC 


TB, 9 ; 


initialize TB register 


ADD 


RB,2 ; 




ADDK 


02h 




CALL 


AIC_2ND 




LAC 


AIC_CTR, 2 


; initialize control register 


ADDK 


03h 




CALL 


AIC_2ND 




RET 






AIC_2ND: 






LDPK 







SACH 


DXR 


load transmit data register 


IDLE 




wait for int 


ADLK 


6,16 




SACH 


DXR 




IDLE 




ACC_hi requests 2nd XMIT 


SACL 


DXR 




IDLE 




ACC_lo sets up registers 


ZAC 






SACL 


DXR 


make sure word was sent 


RET 







Figure 1. TMS320C25 code example (continued) 
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TMS320C25 Logical Shifts in Parallel with ALU Operations 

Contributed by Keith Larson 



Design Problem 



Is there a way to perform a logical shift in parallel with the ALU's normal 
operations? 

With an easy trick, a logical right or left shift can be accomplished in parallel with 
another instruction without disturbing the accumulator, multiplier, or any other part 
of the ALU. 

The trick involves thinking differently about how to use the Auxiliary Register 
Arithmetic Unit (ARAU). The ARAU is capable of incrementing, decrementing, 
and index register modification, as well as the following two important features. 

First, to double the value of a number, add it to itself. The ARAU can have the 
current ARP=0 such that a *0+ modification will add ARO to itself. In code ... 



LRLK 
LARP 
MAR 



AR0,Value 

ARO 

*0+ 



load a value into ARO 

point the current ARP to ARO 

add ARO to itself (logical left shift!) 



Second, consider how bit-reversed carry addition is performed in the ARAU. 
The logic of the ARAU is designed to propagate the carries from any half adder to 
the right rather than to the left as in normal addition. One way to remember how 
bit-reversed carry addition works is to think about looking at the inputs and outputs 
through a mirror, reversing the order. This causes the LSBs to switch with the 
MSBs, which is another way to think about bit-reversed carry addition. Figure 1 
shows an ARO bit reverse added to itself (ARP=0). Figure 2 shows what is normally 
used in FFT bit reversals and other DSP algorithms (ARP != 0), with a "mirror" line 
drawn in for reference. 



LRLK AR0,07191h 
LARP ARO ; 

MAR *BR0+ ; Note carries propogate right 

C C C C C 1 

111000110010001 — ARO 
+ 01 1 1 0001 1 001 0001 — ARO 



011100011001000 — NewARO 

C> C> C> C> C> C> C> (last carry is lost) 

Figure 1. 



19 











LRLK 


AR1,0100h 






LRLK 


AR0,0080h 






LARP 


AR1 






RPTK 


7 






MAR 


*BRO+ 








Mirror Line 




LSB MSB 




LSB 




0000100000000000 




0000000000010000 


*BRO+ 


0000000010000000 




+ 0000000100000000 


AR1 bits 


0000100000000000 




0000000000010000 




0000100010000000 




0000000100010000 




0000100001000000 




0000001000010000 




0000100011000000 




0000001100010000 




0000100000100000 




0000010000010000 




0000100010100000 




0000010100010000 




0000100001100000 




0000011000010000 




0000100011100000 




0000011100010000 




0000100000010000 




0000100000010000 




Bit reversed carry — — ► 




*— — Normal carry 


Figure 2. 









This Crick is useful as a logical shifter that does not use the accumulator in any 
way. It is also helpful for performing a decimation in frequency FFT. In this case 
the DFT block size decreases by for every stage of the FFT. When completed, the 
DFT block size will be two and the address offset one. By using a 'BANZ 
Not_done,*BR0+', a good deal of code is eliminated in a tightly-looped, and 
reasonably-efficient FFT. The value of AR0 can at the same time be used to 
access a bit-reversed twiddle table lookup. The same lookup table will work for 
any size FFT smaller than the overall size of the table permits. 

The code for this FFT, written as a complete spectrum analyzer setup for the 
'C2x SWDS and AIB2, is available on the TMS320 BBS (713-274-2323). This 
same code also works with the 'C26. The file to download is C2X_ANAL.EXE, a 
self-extracting PKZIP file. Also available on the BBS is code to perform successive 
approximation routines. A 32-bit integet square-root routine can be found in the 
file BFLTLIB.EXE. 
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TMS320C40 Boot Loader Selection 

Contributed by Daniel Chen 



Design Problem How do I set up to use the boot loader from a system design point of view? 

Solution The TMS320C40 includes a boot loader to allow users to load and execute programs 
from a host processor, inexpensive ROM, or other standard memory devices. 

The boot loader function is selected by 1) setting RESTLOC(l.O) = 00b and 
2) driving the on-chip ROM enable pin (ROMEN) high when resetting the proces- 
sor. After reset, the loader mode is determined by the status of the IIOF3-1 pins, 
which are configured as general-purpose inputs at reset. Although the IIOF0 pin is 
not used for boot load options, it is assumed to be high. The status of the IIOF3-0 
pins is read by polling the IIOF flags in the CPU register IIF. The options are listed 
in Figure I. 



II0F3 


II0F2 


II0F1 


IIOFO 


FUNCTION 


1 


1 







Memory boot loader from 0x00300000 


1 





1 




Memory boot loader from 0x40000000 


1 










Memory boot loader from 0x60000000 





1 


1 




Memory boot loader from 0x80000000 





1 







Memory boot loader from OxAOOOOOOO 








1 




Memory boot loader from 0x00000000 













Reserved 


1 


1 


1 




Communication port boot loader 



Figure 1. Boot loader options 



(continued on next page) 
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+5V 



20K 



IIOFn signal is low for boot loader selection 



74S175 



IACK- 



RESET ■ 



D Q 

>CLK Q 
CLR 




INTn 




+5V 




^4.7K 






IIOFn 





IIOFn signal is high for boot loader selection 



GND 



L 


74S175 




D Q 


INTn 


IACK 


>CLK q 
CLR 









+5V 



RESET 



AND^ - 



4.7K 



IIOFn 



Figure 2. Circuit for generation of 1 1 OF signal for boot loader selection 



To select the correct boot loader mode, the IIOF3-0 pin must be a valid status 
value for a certain time period (refer to the TMS320C40 boot loader program in 
Section 13.2.7 of the TMS320C4x User's Guide for detailed information) and 
the ROMEN pin should be high at all times before host load is completed. Since 
the IACK signal will be brought down for one cycle after the boot load is com- 
pleted (see note), it can be used as a termination signal for boot load. Figure 2 
shows a sample circuit to generate the IIOF3-0 signals for boot load selection 
when the user wishes to use the IIOFn flag for boot load control and interrupt/ 
general purpose I/O. If an IIOFn pin is used only for boot loader selection, pullups 
or tie downs can be used. In this example, after reset, the IIOF pins will stay low 
until the IACK signal is received. 

Note: The address for the IACK instruction should be pointed to external 
memory where the ready signal is applicable. 
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Reducing System Power Requirements 

Contributed by Peter EWig 



Design Problem what steps can I take to reduce overall power requirements of my system? 

Solution With the ever-growing need for battery-operated systems, the need for low-power 
designs has increased significantly in recent years. These low-power applications 
have expanded to include high-speed DSP designs. So, it is necessary to design with 
high-speed devices while maintaining an overall power reduction. There are a num- 
ber of ways to reduce your system power, ranging from the use of CMOS logic to 
varying the clock rates of the logic to powering down unused logic. 

Semiconductor processing in CMOS results in devices inherently lower in power 
dissipation than their BIPOLAR or NMOS counterparts. This is primarily due to 
the fact that once a CMOS gate has stabilized at a level, it requires little or no power 
to stay in that state. The equivalent NMOS or BIPOLAR gate requires power to 
maintain that level. 

The TMS320 devices use fast buffers to improve the access time of external 
resources. These fast drivers can be a significant part of the total power used by the 
device if the care is not taken in the design. First, the on-chip ROM/EPROM and 
RAM of the TMS320 devices is inside the large drivers so it takes significantly less 
power when accessing these memories than when accessing external memories. In 
many applications there are segments of the code that are accessed significantly 
more often than the rest of the code. These segments can either be masked into the 
ROM, programmed into the EPROM, or boot loaded into the RAM. 

Second, minimizing the fanout drive of the output buffers will help minimize the 
power requirements. Collapsing the glue logic into PALs or ASICs will reduce the 
number of inputs these fast buffers must drive. 

Third, power consumption is minimized when all unused input-only pins are con- 
nected to high voltage or ground. This ensures that the inputs to the CMOS gates of 
the device are not floating (not switching). 

Power consumption varies with voltage level. If the Vcc ' s held between 4.75 V 
and 5.0 V the DSP device will consume less power than if it is run above 5.0 V. The 
TMS320 family also includes low-voltage devices like the TMS320LPlx family of 
3.3 V devices. The TMS320C5x generation of devices also supports low-voltage 
operation. If the temperature environment of the system can be regulated to within 
a moderate range, the power consumption of the TMS320 devices can be reduced. 
In the case of hand held instruments, the fact that they are held in a hand indicates 
the temperature is likely well inside the operating range of the device, as 0°C and 
70°C are both rather uncomfortable environments for your hands. 
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Some of the TMS320 devices have power-down modes. These modes include 
IDLE instructions, HOLD modes, and device power-down switches. The IDLE 
instructions shut down part or all of the CPU operations of the device, thus 
reducing the amount of logic that is switching. The TMS320C5x devices also 
include a second IDLE instruction that shuts down virtually all the logic on the 
device, thus reducing the required current to around 1 mA. The HOLD signal 
can also stop the internal CPU of the TMS320C2x and TMS320C5x devices if 
the HM status bit is set to one. The HOLD also puts all the memory interface sig- 
nals in a high-impedance state. 

The power required by the TMS320 device is directly proportional to the fre- 
quency at which is it operating (see Figure 1 and Figure 2). Therefore, if there are 
times when the system does not require the full computational capability of the 
DSP, then the input clock can be slowed to reduce the system power. A simple 
divide down of the input clock can significantly reduce the power. The 
TMS320C5x devices are implemented in static logic so the input clock can be 
stopped, reducing the required current to |jAs. Some of the devices (such as the 
TMS320C28) include power-down transistors. These transistors remove power 
from the CPU while maintaining power to the on-chip RAM. This allows the sys- 
tem to save key registers in the RAM and power down the CPU while the system 
does not require the DSP. Then quickly restore the registers from the RAM once 
the system needs the DSP again. 



Typical Ice vs. Frequency 
TMS320C10 (o-70'C Temp Range) 




12 L- 1 

4 8 12 16 20 24 28 

Frequency (MHz) 



Figure 1. 

In summary, the guidelines below can be followed to reduce the active power: 

• Use CMOS devices. 

• Use on-chip memory. 

• Minimize fanout via integration (ASIC or PAL). 

• Lower voltage. 

• Control temperature. 
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The guidelines below can be followed to reduce power when the system is 
inactive: 

IDLE the CPU. 

Slow down or stop the input clock. 

Switch off the power to dormant circuitry of system. 
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TMS320C5x Power Dissipation Characteristics 
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Figure 2. 
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Interfacing the 

Contributed by Leor Brenman 



Design Problem What interface circuitry is required to connect a 'C3 1 to A/D and D/A? 

Solution The TMS320C3x DSPs are designed to easily interface with 16-bit A/D and D/A 
devices for audio and data acquisition applications. The following figure shows how 
to interface the TMS320C31 with zero glue-logic to Burr-Brown's DSP201/2 and 
DSP101/2 family of D/A and A/D devices. An efficient, low-cost, stereo, digital 
audio interface using a 'C31 and the DSP202 and DSP102 dual-channel D/A and 
A/D chips is shown in Figure 1. 



Burr-Brown DSP102 A/D 



+/- 2.75 V 



+/- 2.75 V 



You can 
alternatively 
useTCLKI ~ 



22 pF 



1 MOhm 



CASC +5 V 



XCLK 



VINA 



SOUTA 



~H~ 12.29 MHz 



VINB 

SYNC 
SSF 

OSCO 

0SC1 
CONV 



Burr-Brown DSP202 D/A 

+5V 



-CASC 



+5V 



CLKRO CLKXO 

TMS230C31 

DRO DX0 

FSRO 



FSXO 



TCLKO 



XCLK 
VOUTA 



+5V 
+5VH SWL 



SINA 
SINB 

VOUTB 
SYNC 
SSF 



CONV 



+/-3V 



+/-3V 



Figure 1. 'C31 zero glue-logic interface to Burr-Brown A/D and D/A 



The DSP102 A/D is interfaced to the 'C3x serial port receive side; the DSP202 
D/A is interfaced to the transmit side. The A/D and D/A are hard-wired to run in 
cascade mode. In this mode, when the 'C3 1 initiates a convert command (CONV) 
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to the A/D via its TCLKO pin, both analog inputs are converted into two 16-bit 
words which are concatenated to form one 32-bit word. The A/D signals the 
'C31 that serial data from the last conversion is being transmitted via the A/D's 
SYNC signal. The 32-bit word is then serially transmitted, MSB first, out the 
SOUTA serial pin of the DSP102 to the DRO pin of the 'C3 1 serial port. The 
'C3 1 is programmed to drive the analog interface bit clock from its CLKXO pin. 
The bit clock drives both the A/D and D/A XCLK input. 

The 'C3 1 transmit clock can also act as the input clock on the receive side of 
the 'C31 serial port. Since the receive clock is synchronous to the 'C31's internal 
clock, the receive clock can run at full speed (even though it is an external clock). 

Similarly, upon receiving a convert command (CONV), the D/A converts the 
last word received from the 'C31 and signals the 'C31, via the SYNC signal, to 
begin transmitting a 32-bit word representing the two channels of data to be con- 
verted. The data, transmitted from the 'C31 DXO pin, is input to both the SIN A 
and SINB inputs of the D/A. 

The 'C3 1 is set up to transfer bits at the maximum rate of about 8 Mbytes/sec 
with a dual-channel sample rate of about 44. 1 KHz by setting the following regis- 
ters (assuming a 32 MHz CLKIN): 

Serial Port: 

Port global control register 0x0EBC0040 



A synchronous receive interrupt service routine is sufficient for parsing and 
transferring data between the serial ports and memory. Source code for setting up 
the serial port and timers of the 'C31 for interfacing to the DSP102 and DSP202 
can be found on the TI BBS, file name: C3XBB.EXE. 



FSX/DX/CLKX port control register 
FSR/DR/CLKR port control register 
Receive/Transmit timer control register 

Timer: 

Timer global control register 
Timer period register 



0x00000111 
0x00000111 
OxOOOOOOOF 



0x000002Cl 
0x000000B5 
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Efficient Coding on the TMS320C5x 

Contributed by Mansoor Chishtie 



Solution 



What are some examples of software that take advantage of the 'C5x architecture? 

Algorithms based on dynamic programming techniques often make use of looped 
code, conditional execution, min-max search, and pointer addressing. The 
TMS320C5x core CPU allows zero-overhead looping, multiple-condition branches, 
delayed jumps and calls, min-max instructions, and post-modify indirect addressing 
to implement efficient search algorithms. 



Past 
Delay States 


Path 
States 


Current 
Delay States 


000 • 


991 


--^g^* ooo 


001 • "**" 




• 001 


010 


^000 


• 010 


011 *r 




• 011 


Previous Time Interval 




Current Time Interval 


Figure 1. A popular Viterbi subroutine selt 


'cts the "minimum cost" path 



The Viterbi decoding algorithm is quite popular in data communications applica- 
tions. One subroutine used by this algorithm is presented here to demonstrate effi- 
cient 'C5x code. The function of the subroutine is to select the "minimum cost" 
path to current delay state out of four possible paths (see Figure 1). Each path has its 
associated "cost value" and each past delay state has an accumulated cost. The new 
accumulated cost is computed by adding the path cost to the accumulated cost of the 
past delay state. The path with the minimum accumulated cost is selected and the 
rest are discarded. 
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When a path link is selected, the path state value identifying the link and the 
past delay state are stored in appropriate tables (PAST_PTH, PAST_DLY). The 
acc. distance and current distance tables (ACCDIST, DIST) are accessed by indi- 
rect circular addressing mode. The four path states are not in binary ascending or 
descending order, but a four-word circular buffer can be set up that steps through 
each path state in the correct sequence. It also resets the pointer to the first state 
automatically after exiting the loop. Note that all the instructions inside the 
block-repeat loop are single-cycle instructions. 

The complete 'C5x Viterbi implementation is available on the BBS. 

* AR1 - ACC_DIST (set up as 4-word circular buffer) 

* AR2 - DIST (set up as 4-word circular buffer) 

* AR3 - MIN_DIST (minimum accumulated distance table) 

* AR5 - PAST_DLY (past delay state table) 

* AR6 - PAST_PTH (past path state table) 



BEGIN 



MAR 
SPLK 
LACC 
SACB 



*,AR1 
#3 , BRCR 
#07FFFH 



;ARP = AR1 
;set up count 
;max value 
;ACCB=07fffh 



RPTB 
ACC 
ADD 
CRLT 



LOOP-1 

*,0,AR2 

*,0,AR3 



; repeat 4 times 

;Get old acc distance 

;Add current distance 



;if (ACC ACCB) 



SAR 
SAR 
MAR 
MAR 
MAR 



SACL 
XC 



AR1 , * , AR6 
AR2, *,AR1 



*,0,AR5 
2,C 



*,AR1 
* + ,AR2 
*+,ARl 



then ACCB=ACC 
;Save new min value 
; Update PAST_PTH 
and PAST_DLY 

/pointer to ACCDIST - PAST_DLY 
/pointer to DIST - PAST_PTH 
;ARP = AR1 

;AR1++ (circular addressing) 
;AR2++ (circular addressing) 



LOOP 
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TMS320C40 DMA Memory Transfer Timing 

Contributed by Daniel Chen 



Design Problem How many cycles does it take the DMA to read/write to external memory? 

Solution To maximize CPU computational power and unload the CPU of the data transfer 
burden, the TMS320C40 provides six DMA channels (12 DMA channels in split 
mode) to handle data transfer concurrently with CPU operations. These DMA chan- 
nels and the CPU share the internal/external buses. Hence, a user-configurable 
DMA fixed/rotate priority arbitration scheme and CPU/DMA priority scheme are 
created to prioritize the bus resource conflict situations (see TMS320C4x User's 
Guide, Section 9 for more information). 

The combination of bus resource conflicts can make DMA memory transfer tim- 
ing very complicated. However, there are certain guidelines to follow to calculate 
the transfer timing for certain DMA setups. The single-channel DMA memory trans- 
fer timing with no CPU or other DMA channel conflict is discussed below. The 
actual DMA transfer timing can be obtained by combining the single-channel DMA 
transfer timing with bus resource conflict situations. 

When the DMA memory transfer has no conflict with the CPU or any other 
DMA channels, the number of cycles of a DMA transfer is dependent upon whether 
the source and destination location are designated as on-chip memory, peripheral, or 



CYCLES (H1) 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


Source On-chip 
Destination On-chip 


R 


W 


R 


W 


R 


W 


R 


W 


R 


W 


R 


W 


R 


W 


R 


Source Local Bus 
Destination On-chip 


R 


R 
Cr 


R 
Cr 


W 


R 


R 

Cr 


R 
Cr 


W 


R 


R 
Cr 


R 
Cr 


W 


R 


R 
Cr 


R 
Cr 


Source Global Bus 
Destination On-chip 


R 


R 
Cr 


R 
Cr 


w 


R 


R 
Cr 


R 
Cr 


W 


R 


R 
Cr 


R 

Cr 


W 


R 


R 
Cr 


R 
Cr 


SOURCE 


DESTINATION: On-chip 


On-chip 


(1 +1) T 


Local Bus 


l(1+Cn + 1l T 


Global Bus 


l(1+Cr) + 1] T 



Figure 1. Timing and number of cycles for DMA transfers when destination is on-chip 
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external ports. When an external port is used, the DMA transfer speed is affected 
by two factors: the external bus wait state and the read/write conflict (i.e., if a 
write is followed by a read, the read takes two cycles). 

Figures 1 through 3 show the number of cycles a DMA transfer requires from 
different sources to different destinations. Each entry in the table represents the 
total cycles required to do the T transfers, assuming that there are no pipeline 
conflicts. 

Accompanying each figure is a table illustrating the timing of the DMA 
transfer. 

Externally, on the global and local buses, writes take at least two cycles. How- 
ever, the CPU/DMA requires one cycle to perform the write to the external 
memory bus. Therefore the DMA/ CPU can transfer data on the next cycle. For 
example, the DMA transfers 1024 words from internal memory RAM block 1 to 
one-wait-state memory on the global bus while the CPU runs from memory on 
the local bus and fetches operands from RAM block 0. DMA transfer time is cal- 
culated from Figure 2 as 1 + (2 + 1) 1024 = 1 + 3072 = 3073 cycles. 
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Figure 2. Timing and number of cycles for DMA transfers when destination is local bus 



Legend: 

T ■ Number of transfers 
Cr = Source-read wait states 
Cw = Destination-write wait states 
IRI = Single-cycle reads 
IWI = Single-cycle writes 
IR.RI = Multi-cycle reads 
IW.WI = Multi-cycle writes 
ICrl = Number of wait cycles for a 
read 

ICwl = Number of wait cycles for a 
write 
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Designing with TMS320C40 Comm Ports: Part 1 



Contributed by Keith Larson and Jim Patterson 



Problem What are some design issues/tips when designing with 'C40 comm ports? 

Solution The 'C40 comm port is a very-high-speed data transmission circuit. Its speed and the 
close proximity of multiple data lines create special challenges. The following cave- 
ats are intended to help you past some of the potential problem areas. 

First a question and answer. In the 'C40 User's Guide (pp. 14-33-14-35), it says 
that CSTRB is an output before CREQ goes high, but it also says CSTRB is an input 
after CREQ goes high. The timing data implies that two outputs are connected 
together for 0.5 p. The answer is that while both 'C40s are driving these lines for a 
period of time, they are both driving in the same direction (V Q h). As a result, there 
is no current from one device to the other. 

Signal Quality 

The transmission line aspects of the comm port circuit make it sensitive to signal 
quality. General design rules that would be applicable to high-speed (<10 ns) mem- 
ory interface design would be appropriate for 'C40 comm port interconnections. 

Further points to keep in mind include: 



1) An overlap feature is built into CREQ, CSTRB, and CRDY when a token is 
transferred. This overlap will cause these signals to all drive high (at both ends), 
ensuring that neither end is susceptible to floating or low-noise signals. 

It is important to match the clocks or else the original driver may not give up his 
end soon enough, which causes bus contention. 

2) When the token exchange occurs, the falling edge of CACK in dicates th at 
there are no transfers in progress, so it is ok to drive both ends of CSTRB high 
(1) Figure 14-23. 

3 ) The requester then acknowledges the receipt of CACK low by driving CREQ 
high and staying active high (3) Figure 14-23. 

4) CREQ goi ng high is interestin g becau se of (5) Figure 14-24. In this case, the ris- 
ing edge of CREQ causes the CREQ input to switch over to an active output 
high. A t this time, both dev ices are drivin g CREQ high. The rising edge of 
CREQ also causes CACK and CSTRB to change to inputs, also with only a 
couple of gate delays. 
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5) Finally CACK, which is now floating, is driven active high (4) Figure 14-23. 
VERY IMPORTANT 

6) The clocks of the two 'C40's connected together must be within a 2:1 ratio. If 
this is not adhered to, the overlap will last too long and the new master (the 
one with the faster clock) may start driving low before the old master has relin- 
quished that line. This will cause signal contention and possibly a lot of 
current. 



Design Hints 

• Use series resistors in all lines. This helps match the output buffer impedance 
to line impedance, protects against signal contention, and has low power dissi- 
pation. If the line length is small (6"), a single resistor in the middle can be 
used. The resistor value, plus buffer output impedance, should match the line 
impedance. The buffer output impedance is in the range of 20 to 70 ohms. A 
resistor value of 27 - 33 ohms may be a reasonable start. Some experimenta- 
tion may be needed. 



Output 



/ Rb 
-40 Ohm 



Figure 1. 



Input 



—WA- 
RS 
-47 Ohm 
(Lowerthan optimum) 



at 



Z0=1000hm 




• Try to keep the line impedance as high as possible. Routing signals on a top 
layer without a shield above them will help yield both clean signals and high 
impedance. Do not route signals on top of each other. When it is necessary to 
cross traces on adjacent layers, they should cross as right angles to reduce cou- 
pling. High line impedance will reduce the sensitivity of the circuit to changes 
in the output buffer impedance, and will be a benefit when interfacing to ex- 
ternal cables, as typical ribbon cable is about 100 ohms. 

• Because it is sometimes difficult to route high-impedance lines (especially long 
ones) in a circuit board, an external ribbon cable can be used to jump over the 
length of the board. In this case, only two headers need to be installed in the 
circuit board. Use an alternating signal and ground scheme. For quality signals 
a (4 control + 8 data + 1 shield) X 2=26-wire ribbon is needed. The shield is 
needed for the signal that is otherwise on the edge. 

• The driver output consists of three transistors, one pullup to V cc and two pull- 
downs. The DV SS transistor (Q2) is on above 1.8 volts and the CV^ transistor 
(Q3) below. The advantage of a two-transistor pulldown is two-fold. First, a 
major portion of the switching noise in Q2 is dumped into DVgs and does 
not corrupt the clean logic CVss- Second, the ratio of Q2/Q3 and the 1.8-V 
switching threshold provides a nearly-ideal driving signal for a wide range of 
transmission line impedances. Note the Ron values. They are quite low, and if 
a fault occurs something will get HOT! 

• If long lengths are needed or jumps to other boards are needed, then a uni- 
directional data flow should be considered. As there is currently no preferred 
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Figure 2. 



method of buffering the token for bi-directional buffers. The best method is to use 
slow buffers with hysterisis for CSTRB and CRDY. This has two advantages. It 
cleans up the signals and helps eliminate glitches which can be erroneously per- 
ceived a s valid control. It also allows the data bits to settle before the receiver sees 
CSTRB 



CAUTION: 

The width of CSTRB low should not exceed 1.0 H at the receiving end. If it does, 
the b yte sequ encer, which has looped back to byte zero, will see this low and recog- 
nize CSTRB as the next valid byte. This is not a problem unless you are working 
with very long distances. In this cas e, use fl ip-flops to locally shorten CSTRB at 
the receiver while returning a valid CRDY width to the sender. Wide widths at the 
sender are not a problem. See Figure 3 for an example. 
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Figure 3. 



CSTRB Circuit With Token Direction Detection 

Because all signals are bi-directional, it is difficult to determine the direction of 
data transfer. A method wh ich has been shown to work is given in Figure 3 . In 
this case, the rising edge of CREQ is used to toggle the previous value in a flip- 
flop giving direction. The inital state is determined b y reset at power up. 

Once direction is known, controlling the width of CSTRB is straightforward. 
Looking at the circuit you will notice that in one direction only CSTRB/CRDY 
buffering is done at one end and a pair of SR flips are in the circuit at the other. 

For the data receive end (with SRs), a low incoming CSTR B will ca use the 
'C40's pin to g o low and stay lo w until th e 'C40 responds with CRDY low. 
When CRDY falls, the 'C40's CSTRB (local) will go high, satisfying the 1-H 
criteria. When CSTRB (incoming) goes back high, the SR flip pair is ready to 
receive another CSTRB 



Conclusion 

For distances less than 12", series resistor matching is reliable so long as the de- 
sign guidelines described above are adhered to. For distances greater than 12", a 
uni-directional transfer has been sho wn to be reliable when all signals are prop- 
erly buffered. The width of CSTRB is important and for very long distances may 
need to be controlled by external logic. 
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Creating a Delay Buffer on a TMS320C2x EVM 

Contributed by Tom Homer 



Design Problem How can I implement an audio delay buffer with the 'C2x EVM? 

Solution The key to this technique is that the buffer length is equal to the sample delay time 
you want to use and that the input/output rates are equal. There is only one pointer 
required and it is used for output and input both (in that order). The delayed value 
is first output and then a new input value is read into memory. Finally, the pointer is 
incremented to the next memory location. Due to the fact that there is only one 
pointer overhead to check if pointer(s) is at the end of the buffer is reduced. Use a 
counter to determine when the pointer is a the end of the buffer. This approach can 
be implemented using a BANZ instruction. 





'C2x EVM 






Microphone 






Speaker 













Figure 1. Hardware 




Figure 2. Memory - delay buffer 



36 



Software 

This shows only the portion required for the delay buffer implementation. The 
entire program is on the BBS as 2XEVMBUF.EXE, which is a self-extracting zip 
file. 





CONSTANTS 






BUFFER_START .set 08000h 


; Define delay buffer constants 


BUFFER_LENGTH .set 04000h 






, 


MEMORY DEFINITION 






; Reserve ext RAM for delay buffer 




DELAY 


. usect "ext_mem" , 


16384 






ZERO DELAY BUFFER 






larp 


AR1 






lrlk 


ARO, BUFFER_LENGTH-1 


ARO = Memory block 








length- 1 


lrlk 


AR1, BUFFER_START 


AR1 = Delay Buffer 








pointer 


ZAC 






ZER01 








sacl 


*+, ARO 


; Initialize Delay Buffer to 


banz 


ZEROl, AR1 


;Done?? 






INITIALIZE DELAY BUFFER 










lrlk 


ARO, BUFFER_LENGTH-1 


ARO = end of buffer 








counter 


lrlk 
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AR1 = output /input 
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; INTERRUPT SERVICE ROUTINES 


RINT 


; Serial Port Receive Interrupt 
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Figure 3. Software 
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Dual Access into Single-Access RAM on a 'C5x Device 

Contributed, by Mansoor Chishtie 



How do I make two accesses to Single- Access RAM (SARAM) in one cycle? 

Solution 'C5x SARAM is NOT just one big RAM block where only one access per cycle is 
allowed. Instead, it is actually made up of 2K-word size independant RAM blocks, 
each one of which allows one CPU access per cycle. Hence, the CPU can read/write 
one block while accessing another 2K block at the same time. All 'C5x processors 
support multiple accesses to its SARAM in one cycle as long as they go to different 
RAM blocks. In the case where total SARAM size is not a multiple of 2, one block 
is made smaller than 2K words: 

If you understand these restrictions, then you can appropriately arrange code and 
data to improve code performance. 



Table 1 . SARAM Blocks Vs. Device 


Device 


Number of SARAM blocks 


'C50 


Four 2K blocks and one 1 K block 


C53 


One 2K block and one 1 K block 


'C51 


One 1K block 



The details of 'C5x SARAM organization appear in Chapter 4 of the 'C5x User's 
Guide (pp. 4-24, 6-2, C-2). Instruction cycle tables in Chapter 4 (pg. 4-24) cover all 



'C5x Dual-Access RAM 

'C5x dual-access RAM is TRULY dual-access in the sense that it will let you access 
twice per cycle with no restrictions on what locations are accessed. 
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A Simple Way to Terminate Unused TMS320C40 Comm Ports 

Contributed by Jim Patterson 



Design Problem Is there an easy way to terminate an unused comm port without using external pull- 
up resistors? 

Solution You can terminate the control lines on unused individual comm ports by tying the 
control lines together on the same comm port, i.e., 



CSTRB t o CRDY 
CREQ to CACK 

The idea is that this would hold the control inputs high without the use of exter- 
nal pull up resistors. As a secondary effect, a port terminated like this would provide 
a monitor/test point, to which one could connect a logic analyzer or external device 
to capture data written to the port. 

Writing a Word to the Comm Port 
Case 1 : Writing to ports 0, 1 , or 2 

These are output ports at reset and will not request the token when the processor 
writes to the port (i.e., these ports already have the token). CSTRB will drive 
CRDY correctly for each byte written. 

Case II: Writing to ports 3, 4, or 5 

When the processor writes a word to the port the first time, the port will request the 
token and the control and data lines will switch directions without problems, but 
there is no way to make the CREQ line go active again. That port can continue 
sending words, but it cannot become an input port again. 
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Contributed by Jeff Beinart/Mansoor Chishtie 



Design Problem How does the RETE instruction work when another interrupt is pending? 

Solution Let's assume that we are in an Interrupt Service Routine and that an external inter- 
rupt occurs which is low for three cycles thereby setting the appropriate bit in the 
1FR on the next cycle. However, since we are in the ISR, INTM = 1, globally dis- 
abling the next interrupt from being recognized. The last instruction in the ISR is a 
RETE. 

If there is a pending interrupt in IFR when RETE is executed then 'C5x will 
immediately jump to the pending ISR without going back to the interrupted code. 



Cycle 





1 


— interrupt occurs while in ISR 
2 3 4 5 6 


7 


8 9 


10 


Fetch 


ISR1 


ISR2 


ISR3 RETE D D D 


16 


17 




Decode 




ISR1 


ISR2 ISR3 RETE D D 





INTR 




Read 






ISR1 ISR2 ISR3 RETE D 


D 


D INTR 




Execute 






— ISR1 ISR2 ISR3 RETE 


D 


D D 


INTR 


INTM 


1 


1 


11111 










RETE changes INTM at the end of execute phase. 

INTM becomes active in cycle 7, that will make interrupt jamed on next cycle in the decode phase. 



Note that 16 and 17 in Figure 1 will be fetched again when the 'C5x returns from 
the interrupt. 

Note: D refers to "Dummy Cycle." 
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Fast Logarithms on a Floating-Point Device 

Contributed by Keith Larson 



Design Problem What is the fastest way to calculate logarithms (base 2) on a TMS320C30 or 
TMS320C40? 



The following TMS320C30/'C40 function calculates the log base two of a number 
in about half the time of conventional algorithms. Furthermore, the method can 
easily be scaled for faster execution if less accuracy is desired. The method is efficient 
because the algorithm uses the floating-point multipliers' exponent/normalization 
hardware in a unique way. The following is a proof of the algorithm. 
The value of a floating point number X is given by 

X = 2"EXP_old * mant_old 

If you then consider that the bit fields used to store the exponent and mantissa 
are actually integer, you will notice that the exponent is already in log2 (log base 2) 
form. In fact, the exponent is nothing more than a normalizing shift value. 

By converting both sides of the first equation to a logarithm, we find that the 
logarithm of the value becomes the sum of the exponent and mantissa in log form. 



log2 (X) = EXP_old + log2 (mant_old) 



(Log base two) 



Since EXP is in the exponent register, no calculation is needed and the value can 
be used directly as an integer. To extract the value of the exponent, PUSH, POP, 
and masking operations are used. 

The remaining mantissa conversion is done by first forcing the exponent bits to 
zero using an LDE 1.0 instruction. This causes the exponent term 2 A EXP to equal 
1.0, leaving 1.0 < Value < 2.0. Then, by using the following identity, the logarithm 
of the mantissa can be extracted from the final results exponent. 

If the value (mant_old) is repeatedly squared, the sequence becomes 



X_new = mant_old~N 



Where: 



1.0 < X_new < 2"N 
N = 1,2,4,8,16. . . 



Since the hardware multiplier will restructure the new value (X_new) during 
each squaring operation, we see that X_new will be represented by a new exponent 
(EXP_new) and mantissa (mant_new). 

X_new = 2~EXP_New * mant_new 
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By then applying familiar logarithm rules, we find that EXP_new holds the 
logarithm of 01d_mant. This is best shown by setting the previous two equations 
equal to each other and taking the logarithm of both sides. 



mant_old~N = 2"EXP_new * mant_new N=l,2, 4, 8, 16 . . . 

N * log2 (mant_old) = EXP_new + log2 (mant_new) 
log2 (mant_old) = EXP_new/N + log2 (mant_new) /N 

This last equation shows that the logarithm of mant_old is indeed related to 
EXP_new. And as shown earlier, EXP_new can be separated from the new man- 
tissa and used as the logarithm of the original mantissa. 

We also need to consider the divisor N, which is defined to be the series 1, 2, 
4, 8, 16... , and EXP_new is an integer. The division by N becomes a shift for 
each squaring operation. What remains is to concatenate the bits of EXP_new to 
EXP_old and then repeat the process until the desired accuracy is achieved. 

Example 

Consider a mantissa value of 1.5 and an exponent value of (giving an exponent 
multiplier 2 A 0, or 1.0). The The TMS320C30/'C40 extended register bit pattern 
for the algorithm sequence is shown below. 



of FO = 1 .5 



Exp 



Mantissa 



00000000 1000000000000000000000000000000 

00000001 0010000000000000000000000000000 
00000010 0100010000000000000000000000000 
00000100 1001 101 00001 0000000000000000000 
00001001 0100100001101011101000001000000 
00010010 1010010101010011111101110011111 
00100101 0101101010110110101000010101001 
01001010 1101010110010010001010101100011 



X 

X»2 

XM 

X A 8 

X A 16 

X A 32 

X*64 



=1.5 

=2.25 

=5.0625 

=25.628906 

=656.84083 

=431.43988-E3 

=186.14037-E9 



X"128 =34.648238-E21 



XXXXXXXX S MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

— Exp — ► S < — Mantissa » 



Exp=0 

Exp=1 

Exp=2 

Exp=4 

Exp=9 

Exp=18 

Exp=37 

Exp=74 



Hand-calculated value of log2( 1.5) 

log2(1.5) = 0.58496250 = 1001010 111000000 

► xxxxxxx* first 7 bits (exponent) 

► mmm-*— quick 3 bits (mantissa) 

If you compare the hand-calculated value and the binary representation of 
log2(1.5) you will find that the sequence of bits in the exponent (seven bits 
worth) are equivalent to the seven MSBs of the logarithm. If the exponent could 
hold all the bits needed for full accuracy, then it would be possible to continue 
the operation for all 24 bits of the mantissa. Since there are only eight bits in the 
exponent and the MSBs is used for negative values, only seven iterations are pos- 
sible before the exponent must be off-loaded and reinitialized to zero. 



By concatenating EXP_new to the previous exponent, longer strings of bits can 
be built for greater accuracy. The process is then repeated until the desired accuracy 
is achieved. Also remember that the original numbers exponent, which represents 
the whole number part of the result, becomes the eight MSBs of the final result. 

Another trick is to look at the three MSBs of the mantissa, and apply a roundup 
from the fourth bit, those same MSBs can be used as a quick extension of the expo- 
nent (logarithm). To visualize this, consider the following tabulated values and 
graph. 




Note: Notice how the fractional part is the same at the endpoints. 

In the middle, only a slight bowing exists which can either be ignored or option- 
ally rounded for better accuracy. The maximum actually occurs at a mantissa value 

of — - — or 1.442695. The value of logZ(mant) at that point is 0.52876637, giving 
ln(2.0) 

a maximum error of 0.086071. 

When finished, the bits representing the finished logarithm are in a fixed-point 
notation and will need to be scaled. This is done by using the FLOAT instruction 
followed by a multiplication by a constant scaling factor. If the final result needs to 
be in any other base, the scaling factor is simply adjusted for that base. 

Here are a few more helpful points. 

The round-off accuracy of the first three squaring operations will affect the final 
result if >21 mantissa bits are desired. A RND instruction placed after the first three 
MPYF R0,R0 instructions will remedy this, but adds to the cycle count. 

When the input value approaches 1.0, the result will be driven close to zero and 
accuracy will suffer. In this case, an input range comparison and a branch to a 
McLauren series expansion is used as a solution with minimal degradation in speed. 
This is because the power series converges quickly for input values close to 1.0. 

If you only need to calculate a visual quality logarithm, such as in spectrum analy- 
sis, the logarithm can often be calculated in one cycle. In this case the mantissa is 
substituted directly into the fractional bits of the logarithm giving a maximum error 
of 0.086 (about 3.5 bits). The one cycle arises from the need to remove the 2's com- 
pliment sign bit in the TMS320C30/'C40's mantissa. As far as your eye is con- 
cerned, it will never notice the difference! 
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***************************************************** 


* 


FAST logarithm for FFT displays * 


* »» NEED ONLY ADD ONE INSTRUCTION IN MANY CASES «« * 


************************************************************* 


1 1 

MPYF 


1 1 

REAL, REAL, RO 


; calculate the magnitude 


MPYF 


IMAG, IMAG,R1 


; Note: sign bit is zero 


ADDF 


Rl.RO 




ASH 


-1,R0 


;<- One instruction logarithm! 


STF 


R0,OOT 


scaled externally in DAC 


II II 

************************************************************* 


* _log_E.asm 




DEVICE: TMS320C30 * 


************************************************************* 


. global 


log E 




log E : POP 


AR1 


J- C Q^UJ. COO r AIVJ. 


POPF 


RO 


X -> RO 


LDF 


RO Rl 




LDI 


2,RC 


repeat 3x 


RPTB 


loop 


8 + 13*3 + 9 


ASH 


7,R1 




LDE 


1.0, RO 


EXP = 


MPYF 


R0,R0 


mant'~2 


MPYF 


R0,R0 


mant' s 4 


MPYF 


R0,R0 


mant~8 


MPYF 


RO,R0 


mant A 16 


MPYF 


RO,R0 


mant'-32 


MPYF 


R0,R0 


mant A 64 


MPYF 


R0,R0 


mant A 128 


PUSHF 


RO 


offload 7 bits of exponent 


POP 


R3 




ASH 


-24, R3 


remove mantissa 


loop : OR 


R3,R1 


R2 accumulates EXP <log2 (man) > 


ASH 


11, Rl 


Jam mant_Rl to top (concat. EXP_old) 


ASH 


-20, RO 


align and append the MSBs of mant_R0 


OR 


R0,R1 


(accurate to 3 bits) 


PUSHF 


Rl 


PUSH EXP and Mantissa (sign is now data!) 


POP 


RO 


POP as integer (EXP+FRACTION) 


BD 


AR1 




FLOAT 


RO 


convert EXP+FRACTION to float 


MPYF 


@CONST,R0 


scale the result by 2^-24 and change base 


ADDI 


1,SP 


restore stack pointer 


.data 






CONST_ADR: 


.word CONST 




CONST 


.long 0e7317219h ; ;Base e hand calc w/1 lsb round 




. end 





Figure 2. 
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Switching From Bootloader to MP Mode with TM5 



Contributed by Daniel Chen 



Design Problem How can I boot change from bootloader mode to microprocessor mode "on-the-fly?" 

Solution The 'C3 1 bootloader is expected to be used at reset and then the 'C3 1 stays in 

MCBL mode the rest of the time. Yet, it is sometimes convenient that once loading 
is done, change the MCBL/MP pin to enable the microprocessor memory map. 

The 'C3 1 device will continue to sample the MCBL/MP pin status. Therefore, it 
is possible to change the MCBL/MP mode without resetting the device. The user 
needs to make sure the MCBL/MP pin is high during the bootloading and the 'C31 
is not using the overlapping memory during the mode transition time. 

The user should use a routine in which it is guaranteed no program fetch or data 
read/write is performed to the overlapping memory during the transition time. A 
'C31 -initiated interrupt issued via an otherwise unused pin (e.g., XFO, a timer bit— 
I/O) can be used to request a transition on the MCBL pin. 
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TMS320C5x Interrupt Response Time 

Contributed by feff Beinart 



Problem What are the important issues in TMS320C5x interrupt latency/processing? 

Solution This design note calculates the speed at which the TMS320C5x can recognize con- 
secutive interrupts. This time depends on the interrupt latency and the time 
required to service the interrupt. A practical application, a bar code scanner routine, 
will be used as an example. 

Figure 1 shows a bar code where the bar widths and spacings between bars are dif- 
ferent, signifying a specific number associated with each bar. To determine the 
width of a scanned bar, external hardware must generate an interrupt at both the 
leading and trailing edges of the bar. The TMS320C5x context switches to an ISR, 
which copies the contents of a Timer Register out to data memory. At some later 
time we can subtract the timer values, corresponding to each interrupt, and multiply 
by the Timer clock period to find the bar width. 

The Timer clock frequency is a key parameter. A higher frequency yields a more 
accurate calculation. 

Bar Code Width = (Difference in Timer Values)*(CLKOUTl period) 





NOTES: 

1 . Interrupts are generated at leading and trailing edge of each tar. 

2. The ability of the C5x to recognize two consecutive interrupts will determine the 
widths of the bars and the spacious between bars. 



Figure 1. Example of a bar code with interrupts generated at edges 
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Problem Definition 

1. Use a TMS320C5x-80 running at an internal machine rate of 40 MHz (inter- 
nal machine period of 25 ns per cycle). 

2. There are two stages to the Timer with each stage consisting of a period regis- 
ter and a counter register. Only the 16-bit wide TIM counter and PRD period 
registers of the second stage of the timer will be used. This means the ISR will 
copy the TIM counter register to data memory. 

3. In this application, the ISR consists of four instructions, thereby requiring a 
total of four locations in the vector table, thus reducing the external interrupt 
capability of the 'C5x by one. 

4. The interrupt latency of the 'C5x depends on the current contents of the pipe- 
line. The device always completes all instructions in the pipeline before exe- 
cuting the soft vector. It is up to the software engineer not to use multiple- 
cycle instructions or non-interruptible instructions (such as RPT & RPTZ) 
during the inner loop routine of the bar code scanner routine. 

5. The inner loop of the bar code scanner routine is small enough to be run out 
of internal program space. This could be from internal ROM or from internal 
program RAM. This will minimize the execution time of copying the Timer 
Counter register to data memory. 

6. The ISR dedicates AR5 as a pointer to the memory location where the TIM 
counter register gets written. AR5 is incremented within the ISR so that the 
information is constantly stored in different memory locations. AR5 cannot be 
used in the main program. 

ISR Implementation 

An external interrupt is generated whenever the edge of a bar code is detected. 
The 'C5x will recognize the interrupt and vector to an ISR which copies the TIM 
Timer counter register out to data memory and returns to the main routine. The 
ISR consists of the following instructions: 

MAR *,AR5 ; Load ARP with 5 

LAMM TIM ; Load lower half of ACC with TIM counter 

SACL *+ ; Store ACC low indirectly to memory 

RETE ; Return to main program and enable interrupts 

TIM is the Timer counter register, which is a peripheral memory-mapped 
register. The LAMM is a single-word, two-cycle instruction. All the other instruc- 
tions are single-word, single-cycle instructions. 

The RETE is specified as a four-cycle instruction because the pipeline is 
flushed on the return from the ISR. It therefore takes four cycles to execute the 
next instruction in the main routine. Since as soon as the RETE is executed inter- 
rupts can occur, RETE can be considered a single-cycle instruction. 

Figure 2 illustrates what the pipeline looks like when two consecutive inter- 
rupts are recognized by the 'C5x. The first interrupt occurs before cycle 1, 
whereas the second interrupt occurs before cycle 13. The significant pipeline 
events are summarized after Figure 2. 
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[■ 1st interrupt 



Cycle 





| 1 


2 


3 


4 


5 6 


7 


8 


Fetch 


11 


12 


13 


14 


15 


16 D 


D 


D 


Decode 




11 


12 


13 


14 


15 INTR 


D 


D 


Read 






11 


12 


13 


14 15 


INTR 


D 


Execute 








11 


12 


13 14 


15 


INTR 








1 — 2nd interrupt occurs in this period 






Cycle 


9 


10 


11 


12 


13 


14 (15) 


16 


17 


Fetch 


MAR LAMM 


SACL 


D 


RETE 


D D 


D 


16 


Decode 


D 


MAR 




SACL 


D 


RETE D 


D 




Read 


D 


D 


MAR 


LAMM 


D 


SACL RETE 


D 




Execute 


D 


D 


D 


MAR 


LAMM 


LAMM SACL 


RETE 





Cycle 


18 


19 


20 


21 


22 


23 


24 


25 


26 


Fetch 


D 


D 


D 


MAR 


LAMM 


SACL 


D 


RETE 


D 


Decode 


INTR 


D 


D 


D 


MAR 


LAMM 


SACL 


D 


RETE 


Read 




INTR 


D 


D 


D 


MAR 


LAMM 


D 


SACL 


Execute 






INTR 


D 


D 


D 


MAR 


LAMM 


LAMM 


Cycle 


(27) 


28 


29 


30 


31 


32 


33 


34 


35 


Fetch 


D 


D 


16 














Decode 


D 


D 
















Read 


RETE 


D 
















Execute 


SACL 


RETE 
















Legend 



I represents the specific Instruction number, n 

D stands for DUMMY cycle 

MAR Modify Auxiliary Register 

LAMM Load Accumulator tow with contents of TIM register 

SACL Store Accumulator Low out to Data Memory 



RETE Return from interrupt and clear INTM bit {globally enable interrupts) 

Figure 2. Calculating the minimum time between interrupts 



Explanation of Figure 2 

1 . Cycle 0-1 High-to-low transition of " 1st" external interrupt occurs before the 

fetch of 12. 

2. Cycle 3 Interrupt must be held low for three clock cycles, whereupon it is 

recognized by the CPU. 

3. Cycle 4 Appropriate bit of Interrupt Flag Register (IFR) is set signifying an 

interrupt has occurred. 

4. Cycle 5 16 is fetched, however, it will be refetched after the return from 

interrupt. 

5. Cycle 6 INTR jammed into the pipeline. 



6. Cycle 9 Start o f first in struction in the ISR. INTM is set to a 1 (INTM=1 ) 

and an I ACK is generated. INTM is a global interr upt bit and 
will globally disable any interrupts from occurring. I ACK clears 
an internal 'C5x FF and allows a bit in the IFR register to be set 
by the next external interrupt. 

7. Cycle 10 LAMM, a 1-word, 2-cycle instruction, is fetched. 

8. Cycle 9-13 High-to-low transition of "2nd" external interrupt (could actu- 

ally occur sometime after cycle 9 and before cycle 13). 

9. Cycle 15 SACL instruction executes completing the write of the TIM 

Timer counter register to data memory [pointer register (AR5) 
is updated by 1]. 

10. Cycle 16 RETE instruction executes causing a return to the main program 

and causing INTM=0 (global interrupt enabled). The pending 
(second) interrupt can be recognized starting at the next cycle. 

1 1. Cycle 17 16 is fetched, however, it will be refetched after the return from 

interrupt. 

12. Cycle 18 INTR jammed into the pipeline. 

13. Cycle 27 SACL instruction executes completing the write of the TIM 

Timer counter register, corresponding to the second interrupt, to 
data memory. 

In this analysis, the 'C5x cannot distinguish as to when the actual interrupt oc- 
curred between cycles 9 and 13. It will respond to the interrupts in the same way. 
Let's assume the second interrupt occurs right before cycle 10. It will take four 
cycles from when the second interrupt occurs until it is recognized by the CPU 
and the Interrupt Flag Register bit of the IFR is set. The processor will wait until 
the INTM = before recognizing the new interrupt. This occurs at cycle 16. 
Although 16 is fetched in cycle 17, it is discarded when INTR is jammed into the 
pipeline at cycle 18. 

The Timer counter register values are copied to data memory in cycles 15 and 
27. Therefore the bar code width is calculated as follows: 

Bar Code Width = (Clock cycle 27 - Clock cycle 15)*(CLKOUTl period) 
= (12 cycles)*(25 ns per cycle) 
= (12*25 ns) 
= 300 ns 



In the above calculation it was assumed the second interrupt occurred be- 
tween cycles 9 and 13. If the second interrupt occurs at cycle 14, then the reading 
of the Timer value to data memory will be delayed by one additional cycle (if it 
occurs at cycle 15, then it will be delayed by two cycles and so forth). 

Conclusion 

The TMS320C5x is able to recognize interrupts every 12 clock cycles. There are 
three instructions, or four cycles consumed by the ISR. Since the 'C5x is a pipe- 
lined architecture, it is difficult to calculate the interrupt latency. For this analy- 
sis the best case interrupt latency is found to be eight cycles (12 — 4). 
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TMS320C2x/'C5x EVM AIC Initialization and 

Contributed by Thomas G. Homer, P.E. 

Design Problem What are the issues in initializing a TMS320C2x/'C5x floating-point EVMs? 

Solution Texas Instruments' TMS320C2x and TMS320C5x EVMs come with 'C26/C51 

DSPs interfaced by serial port to a TLC32046 Wide-Band Analog Interface Circuit 
(AIC) with the AIC providing the frame sync pulses and shift clocks. The AIC is a 
configurable device which uses the serial port to download commands from the 
DSP. The communications protocol uses an interleaving technique which will not 
disrupt normal output of the DAC. 

There are primary and secondary transmit data word formats. The primary data 
word is the normal data output format and the secondary data word carries configura- 
tion data to the AIC. Both data word formats use bits - 1 to send commands to the 
AIC, while bits 2-16 are for either the data word (Primary) or configuration word 
(Secondary). A list of the functions is shown in Table 1 (Primary) and Table 2 
(Secondary): 





Table 1. Prims 


ry Data Word Command! 


D1 


DO 


Function 








Normal Output 





1 


Increase Sample Rate 


1 





Decrease Sample Rate 


1 


1 


Initiate Secondary Communications 



Table 2. Secondary Data Word Commands 






DO 


Function 










Update TA/RA Registers 







1 


Update TA'/RA' Registers 




1 





Update TB/RB Registers 




1 


1 


Update Control Register 



The timing between the Primary and Secondary data words is fairly tight for the 
'C26. This design note is intended to clarify the technique required when reconfigur- 
ing the AIC using either the 'C2x or 'C5x EVMs. 
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The DSP and AIC use separate oscillators to generate their respective Master 
Clocks which introduces an additional constraint in the timing between the Pri- 
mary and Secondary data words due to the potential phase offset between the two 
CLKs. 

'C26 Master Clock = 40.000 MHz 
'C26 Instruction Cycle = 10 MHz 

•C50 Master Clock = 20.000 MHz 
'C50 Instruction Cycle = 20 MHz 

AIC Master Clock = 10.368 MHz 
AIC Shift Clock = 2.592 MHz 

= 3.86 'C26 Instruction Cycles 

= 7.72 'C50 Instruction Cycles 

The maximum AIC conversion frequency is 25 KHz, which gives a minimum 
period of 40 usee between data samples. When the Primary data word command 
bits are set to 1 lb, the Secondary transmit frame sync pulse goes LOW FOUR 
AIC SHIFT CLOCKS after the end of the Primary transmission. This timing 
allows the secondary command communications to occur between normal data 
communications to the DAC on the AIC. To correctly reconfigure the AIC, the 
secondary command word must be written to the DSP's Serial Port Transmit Reg- 
ister (DXR) before the secondary frame sync pulse goes low. If the TRANSMIT 
interrupt is used to control writes to the DSP DXR during AIC configuration, the 
XINT signal occurs approximately 15 ('C26) or 154 ('C50) instruction cycles 
before the FSX goes LOW to signal start of transmission (best case— assuming 
that the DSP and AIC Master Clocks are in phase). If the Master Clocks are not 
in phase (high probability), then there will be 2 - 3 fewer instruction cycles avail- 
able before FSX goes low again. The diagram in Figure 1 shows the timing between 
the Primary and Secondary AIC transmissions and the 'C26 and 'C50 XINT 




Figure 1. Timing diagrams 
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signal. Note that the 'C26 XINT occurs after all 16 bits have been shifted out of the 
transmit shift register (SXR), while in the 'C50, XINT occurs after the contents of 
the data register (DXR) are loaded into the SXR at the beginning of the transmis- 
sion. This difference in serial port operation makes the timing on the 'C50 much eas- 
ier to meet. 

Software Examples: 

The following software shows an example of how to reconfigure the AIC by writing 
to the AIC control register. This technique can be used for any secondary communi- 
cations to the AIC. There is a latency of approximately 10/17 instruction cycles 
CC26/'C51) from XINT to writing to the DXR. 



'C26 PROGRAM 

.set Offefh ,-Define constants for AIC 

.set OlOh 
.set 03h 
.set 0277h 
Data Word Format 

dl2 
|dl0 d6 d2 
dl4| |d8 |d4 |d0 

I I I I I I I I 
vvvvvvvv 
xxxxxxxxxxxxxxllb Signals secondary Xmit mode 



AIC_RESET_LO 
AIC_RESET_HI 
AIC_SETUP 
AIC_CCNTROL 
; Primary Transmit 



Secondary Transmit Data Word Format 



|TA | | RA | 
xxlOOlOxxlOOlOOOb 

| TB | | RB | 
xlOOlOOxlOOlOOlOb 



Secondary Command Syntax 
(dl/dO indicate mode) 

TA and RA counter setup example 
valid range: 4-63 

TB and RB counter setup example 
valid range: 15-127 



I ctrl | 
xxxxxxlOlOHOlllb 
llll 

II \— 

II \ 

I \ 

\ 



Control word setup example 
(0/1) 

d2 = (out/in) A/D highpass filter 
d3 = (out/ in) loopback function 
d4 = (no /yes) Aux input pins 
d5 = (no /yes) RX & TX synchronous 
d7/d6 = Gain 0/0 = 1 

0/1 = 2 

1/0 = 4 

1/1 = 1 

d8 = don't care 

d9 = (out/in) second order sin x/x filter 



.bss 



DEFINITION 



AIC_CNTL, 1 



; Reserve RAM for operands 
;AIC control temp memory 



Figure 2. V26 program 





; INTERRUPT VECTORS 


.sect 


"ext_vecs" 


; Section for external 






; interrupt vectors 


B 


START 


; Processor Reset 




INTO 


, j^xiexTiaj. niLcrrupt ffu 


B 


INT1 


.•External Interrupt #1 


B 


INT2 


.•External Interrupt #2 


.sect 


"int_vecs" 


.•Section for internal 






.•interrupt vectors 


B 


TINT 


.•Timer Interrupt 


B 


RINT 


,- Serial Port Receive 






; Interrupt 


B 


XINT 


; Serial Port Transmit 






; Interrupt 


B 


TRAP 


;S/W Trap 


; CODE 


. text 




; Section for program code 


START 






dint 




.•Global interrupt disable 




— INITIALIZE SERIAL PORT 


ldpk 







fort 





;Set for 16-bit word 






; operation 


sfsm 




;Set for frame sync control 


rtxm 




,-Set for external Xmit frame 






; sync 




— INITIALIZE TLC32046 AIC 


ldpk 


AIC_CNTL 




lalk 


AIC_RESET_LO 


; Force AIC RESET low 


sacl 


AIC_CNTL 




out 


AIC_CNTL, PA2 




rptk 


20 


.-Keep LO for 2 usee 






1 (Spec=800 nsec min) 


nop 






ork 


AIC_RESET_HI 


;Set AIC RESET high 


sacl 


AIC_CNTL 




out 


AIC_CNTL, PA2 




ldpk 







lack 


02 Oh 


.•Enable transmit interrupt 


sacl 


IMR 




eint 




.•Enable global interrupts 


zac 




; Dummy Xmit to synchronize the 


sacl 


DXR 


; 'C26 and AIC 


lalk 


AIC_SETUP 


; Signal secondary Xmit mode 


idle 






lalk 


AIC_CONTROL 


,-Send control word 


idle 







Figure 2. 'C26 program (continued) 
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dint 


; Disable interrupts to reconfig 






; serial port 




lack 


OlOh ; Enable serial port RECEIVE 






; interrupts 




sacl 


IMR ; REC = IMR b4 




eint 


; Enable global interrupts 






MAIN ROUTINE 


MAIN 








idle 






b 


MAIN 


INTERRUPT SERVICE ROUTINES 


RINT 




; SERIAL PORT RECEIVE INTERRUPT 




ldpk 







lac 


DRR, 4 ;Read latest AIC input w/ 16x 






; gain 




sacl 


DXR ;Echo to AIC output 




eint 


; Re-enable GLOBAL interrupt 




ret 


; Return to MAIN 


XINT 




; SERIAL PORT TRANSMIT INTERRUPT 




sacl 


DXR ;Write to DXR register 




eint 






ret 








UNUSED INTERRUPT TRAPS 


INTO 


idle 


/External Interrupt #0 


INT1 


idle 


; External Interrupt #1 


INT2 


idle 


/External Interrupt #2 


TINT 


idle 


; Tinier Interrupt 


; RINT 


idle 


; Serial Port Receive Interrupt 


;XINT 


idle 


/Serial Port Transmit Interrupt 


TRAP 


idle 


;S/W Trap 




.end 




. ************* 


**************************************************** 


.***************************************************************** 



Figure 2. 'C26 program (continued) 



NOTE: 

The 'C5x EVM requires the following command be incorporated into the 
EVMINIT.CMD file to force the 'C50 into microprocessor mode: 

E PMST=0x08 

In addition, whenever the processor is RESET from within the Debugger and soft- 
ware reloaded, you need to issue the following command to get the 'C50 back into 
microprocessor mode: 



?PMST=0x08 



'C50 PROGRAM 



AIC_RESET_LO .set Offefh ; Define constants for AIC 

AIC_RESET_HI . set OlOh 

AIC_SETUP .set 03h 

AIC_CONTROL .set 0277h 

Primary Transmit Data Word Format 

dl2 
|dlO d6 d2 
dl4| |d8 |d4 |d0 

I I I I I I I I 
vvvvvvvv 
xxxxxxxxxxxxxxllb Signals secondary Xmit mode 



Secondary Transmit Data Word Format 

Secondary Command Syntax 
(dl/dO indicate mode) 

|TA | |RA | 

xxl0010xxl001000b TA and RA counter setup example 

valid range: 4-63 

| TB | | RB | 

xlOOlOOxlOOlOOlOb TB and RB counter setup example 

valid range: 15-127 

I Ctrl | 

xxxxxxlOlOHOlllb Control word setup example 

I III (0/D 

\ | | \ d2 = (out/in) A/D highpass filter 

j \ d3 = (out/in) loopback function 

\ d4 = (no /yes) Aux input pins 

\ d5 = (no /yes) RX & TX synchronous 

\ d7/d6 = Gain 0/0 = 1 

0/1 = 2 
1/0 = 4 
1/1 = 1 
d8 = don ' t care 

\ d9 = (out/in) second order sin 

x/x filter 



MEMORY DEFINITION 



; Reserve RAM for operands 
.bss AIC_CNTL, 1 ,-AIC control temp memory 



Figure 3. 'C50 program 
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INTERRUPT VECTORS 



.sect 


"vectors" 


,-Section for external interrupt 






; vectors 


B 


START 


; Processor Reset 


B 


INTO 


; External Interrupt #0 


B 


INT1 


; External Interrupt #1 


B 


INT2 


; External Interrupt #2 


B 


TINT 


; Timer Interrupt 


B 


RINT 


; Serial Port Receive Interrupt 


B 


XINT 


; Serial Port Transmit Interrupt 


B 


TRINT 


;TDM Serial Port Receive 






; Interrupt 


B 


TXINT 


;TDM Serial Port Transmit 






; Interrupt 


B 


INT3 


; External Interrupt #3 


. space 


10*16 


; Reserved space - 10 words 


B 


TRAP 


;S/W Trap 


B 


IK 


.•Non-maskable external interrupt 


CODE 



.text ,-Section for program code 



START 

ldp #0 /Initialize data pointer 

setc INTM ; Global interrupt disable, 

splk #0h,IMR ; Clear interrupt mask register 



SETUP S/W WAIT-STATE GENERATOR 

splk #08, CWSR ;Normal wait-state mapping, 

;I/0 space=64K 
splk #0, PDWSR ; Prog/Data space=0 wait-states 

splk #05555h, IOWSR ;I/0 space=l wait-states 



ldp 
splk 



opl 



INITIALIZE SERIAL PORT 
#0 



#08h, SPC 



#0c0h, SPC 



DBL=0 (bl) 
disabled 
FO=0 (b2) 
operation 



FSM=1 

MCM=0 

TXM=0 

XRST=0 

RRST=0 

XRST=1 

RRST=2 



(b3) : 
(b4): 
<b5> : 
(b6) : 
(b7) : 
<b6) : 
(b7) : 



loopback mode 

16 -bit word 

frame sync control 
external CLKX 
external FSX 
Xmit RESET 
Rec RESET 
Xmit ENABLE 
Rec ENABLE 



INITIALIZE TLC32046 AIC 

ldp AIC_CNTL 

lace #AIC_RESET_LO ; Force AIC RESET low 

sacl AIC_CNTL 

out AIC_CNTL, PA2 

rpt #40 ;Keep LO for 2 usee 

; (Spec=800 nsec min) 

nop 



Figure 3. 'C50 program (continued) 





or 


#AIC_RESET_HI ;Set AIC RESET high 




sacl 


AIC_CNTL 






out 


AIC.CNTL, PA2 




ldp 


#0 






opl 


#020h, IMR 


; Enable Xmit interrupt bit 




opl 


#0h, IFR 


/Clear pending interrupts. 




clrc 


INTM 


.•Global interrupt enable. 




lacl 


#0 


; Dummy Xmit for 








; synchronization 




sacl 


DXR 


; Signal secondary Xmit mode 




lace 


#AIC_SETUP 




idle 








lace 


#AIC_CONTROL 


;Send control word 




idle 








setc 


INTM 


; Global interrupt disable. 




lacl 


#010h 


/Enable serial port RECEIVE 








/ interrupts 




sacl 


IMR 


/ REC = IMR b4 




opl 


#0h, IFR 


/Clear pending interrupts. 




clrc 


INTM 


/Global interrupt enable. 






MAIN ROUTINE 


MAIN 










idle 






. 


b 


MAIN 




; 




INTERRUPT SERVIC 


:e routines 


RINT 




; SERIAL PORT RECEIVE INTERRUPT 




ldp 


#0 






lace 


DRR, 4 


Read latest AIC input w/ 16x 








gain 




samm 


DXR 


Echo to AIC output 




rete 




Return to MAIN w/ interrupt 








enable 


XINT 






SERIAL PORT TRANSMIT 








INTERRUPT 




samm 


DXR 






rete 






i 




UNUSED INTERRUPT TRAPS 


INTO 


idle 


; External Interrupt #0 


INT1 


idle 


; External Interrupt #1 


INT2 


idle 


; External Interrupt #2 


TINT 


idle 




Timer Interrupt 


;RINT 


idle 


; Serial Port Receive 






; Interrupt 


;XINT 


idle 


; Serial Port Transmit Interrupt 


TRINT 


idle 




TDM Serial Port Receive 






; Interrupt 


TXINT 


idle 




TDM Serial Port Transmit 






; Interrupt 


INT3 


idle 


; External Interrupt #3 


TRAP 


idle 


;S/W Trap 


NMI 


idle 


; Non-maskable external interrupt 




.end 







Figure 3. 'C50 program (continued) 
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A Novel Way of Using TMS320C40 Cache 

Contributed by Keith Larson 



Design Problem How can I place any value into the TMS320C40 cache? 



A usual approach to loading the cache is to unfreeze the cache and let it always be 
filled, hoping for a looped block of code. By freezing the cache at the end of time- 
sensitive routines, a little more performance can be expected since the cache does 
not always have to be filled from external memory on the first pass through. How- 
ever, the cache may not always fill completely due to code dependencies or condi- 
tional branching. In this case, it would be desirable to load the contents of any 
address into the cache. 

The routine shown in Figure 1 will poke opcodes from an arbitrary address into 
the cache using a feature of the interrupt processor. In this case, when the RETI 
opcode is executed, writing PGIE to GIE, one opcode following the RETI is pro- 
tected from interrupts and is always fetched (and executed). By properly controlling 
the value of TOS, it is possible to load any external address pointed to by TOS into 
the cache! In this case, an interrupt vector is used to loop the cache loader back to 
itself each time an opcode is loaded into the cache. 

Caution: Since any opcode can be executed in any order, it is important to con- 
trol the potential action of all opcodes fetched in this manner. For example, if an 
opcode is supposed to write data to a location pointed to by an auxiliary register, it 
would make sense to make sure that all the auxiliary registers point to a safe 
"dummy" location. Likewise, adequate controls should be placed on the loader to 
ensure that the correct status is always loaded back into the CPU after each cache 
load. 

Also note that DATA values can be poked into the cache. Since all opcodes go- 
ing into the cache are executed, unpredictable results may occur when loading such 
a value. If a DATA value is loaded into the cache, that value is NOT accessible as 
data from the cache since the DDATA bus cannot be connected to the cache for a 
transfer. IE-only program fetching is allowed from the cache. 

Please note: The routine shown in Figure 1 does not include a full save and re- 
store, nor does it control the values of the data pointers (DP and ARs). It is the pro- 
grammer's responsibility to add the code necessary to provide the context save 
routines and other error checking. 
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; void lcache(*ptr, len) ; 






; loads the program pointed to by ptr into 




; the cache from external memory 






.global start, test, FLAG_0,_lcache 






. global dec , inc , mor 


e , RST , NMI , TD7T_0 






.text 








RST 


.word 


$ 


;set up temporary IVTP in 


; need to align at 512 word boundary 




NMI 


.word 


$+1 


; external RAM 




TINT_0 


.word 
. 


_lcache 






start: 


ldp 


RST 


;set up a new vector table 




ldi 


@RST,R0 








ldpe 


RO, IVTP 


/ 






ldi 


@stack,SP 


;set up a runtime stack 




ldi 


@a_test,R0 


; subroutine to load is "test" 




sti 


R0,@APC 


1 






ldi 


16, RO 


;load 16 cache locations 




Sti 


tan m <kpn 

KU , tsCrTL 








sti 


IIF,@FLAG 


;keep original IIF 




and 


0E3FFh,ST 


; clear, thaw and enable cache 




or 


5800h,ST 








call 


$+1 


;a way to push PC on stack 




pop 


RO 


; takes care of first dunmy pop 




addi 


4,R0 


; 






push 


RO 








call 




;call the cache loader 




or 


OOCOOh, ST 


; freeze and enable cache 




ldi 


@FLAG, IIF 


; restore IIF 




test 


ldi 


15, RO 


,Test code to c 


ram into the 








,- cache 




dec 


subi 


1,R0 


;with conditional branches 




bnn 


dec 








ldi 


-15, RO 






inc 


addi 


1,R0 








bn 


inc 


; 






bud 


test 








nop 










nop 










nop 








_lcache 


ldp 


APC 








ldi 


@CNT,R1 


; 






subi 


1,R1 








bnz 


more 








pop 


RO 


;pop junk address 




rets 




; return to original caller 



Figure 1. 



(continued on next page) 
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more 


Stl 


ri , ycwr 






pop 


Ri 


• 

pop junk address 




10-1 


@APC, Rl 


create "new* return address 




addi 


1 , Rl 






sti 


Rl, @APC 






push 


Rl 






ldi 


@TINTO, I IF 


turn on TINTO 




ldi 


1,IIE 


enable TINTO 




reti 




after return, fetch 1 opcode 




. global 


FLAG, APC, CNT 






. global 


a_test, stack 




FLAG 


.word 





Original IIF register 


APC 


.word 





Auxiliary Program Counter 


cur 


.word 





length to load 


TINTO 


.word 


OlOOOOOOh 




a_test 


.word 


test-1 




stack 


.word 


$ 


reserve stack locations 



Figure 1. (continued) 
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Hardware UART for TMS320C3x 

Contributed fry Lawrence Wong 

Design Problem The TMS320C3X does not offer a UART for asynchronous communication. 



On the DSP BBS there is a software UART emulator available. This will allow the 
TMS320C3x to perform asynchronous communication. There are some instances 
that a hardware UART may be necessary for a particular application. The following 
describes one possible solution for a hardware UART. This design was originally 
done in an FPGA and it can be easily transferred to an ASIC. Modification to this 
design can be done to accommodate faster data rates or different communication pro- 
tocols. 



25-MHz Oscilator 



RS-232 Driver 



TMS320C30 








£ 








8 



_DX0_ 



JSffiL 



UART Logic 



Transmit 
Logic 



Receive 
Logic 



H3 



POUT 



RX 



Figure 1. 



The following schematic is for a 9600-baud UART with one stop bit and a start 
bit. The clock signal, H3, is supplied to the circuit from the TMS320C3x. The DSP 
was running with a 25-MHz clock, which was necessary for a particular application. 
Modification to the FPGA timing circuit will be necessary to accommodate a higher 
clock speed for the DSP. 
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D 




> 





Slop Bit 



XEN 



> 







D 



H3 


CE 02 
03 

R 






^ FSXR 









D Q 

1 
R 


Stop Bit 


1— « 


» 




H3 




FSXR 





Figure 2. Transmit circuitry 



The 'C30 transmit section of the serial port is configured to output eight bits 
of data at a rate of approximately 9.6 KHz. This is achieved by using one of the 
'C30's internal timers and programming it to the desired 9.6 KHz frequency. The 
transmitting port is configured in the fixed burst mode. This allows the leading 
FSX signals to help initiate a start bit for the UART protocols. The stop bit is 
generated at the end of the eighth bit by the UART circuitry. 

The receive section of the UART is activated when the circuitry detects the 
start bit. The start bit is a logical zero. The delay circuit is activated on the falling 
edge of the start bit. The delay is used so that sampling of the incoming data bits 
occur in the middle of the signal level, thus causing the UART to have a higher 
noise immunity. 




Figure 3. Receive circuitry 



After the delay is performed, the timer is activated. The timer has a period of 
104 (Is, which is approximately 9.6 KHz. At each period, a data is sampled into an 
eight-bit shift register. After all eight bits are received, the data is passed to the 'C30 
at a speed of Vs of the H3 clock. The FPGA circuitry interfaces the 'C30 in the fixed 
burst mode of operation to the serial port. Both the clock and the frame sync signals 
are generated by the FPGA circuitry. 

This UART circuitry can easily be designed into an ASIC which could also be 
incorporated into a Configurable Digital Signal Processor (cDSP). Modification to 
this circuit could be done for different serial communication protocols or even 
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Using VRAMs and DSPs for System Performance 

Contributed by Alex Tessarob 



How can I improve my DSP-based system memory performance? 
A Case Study 

HDD designers talk about integration of drive components. Typically they have fo- 
cused on integrating the glue logic in an ASIC and go from an analog to a digital 
servo and then reduce to a single processor drive. Also integration of the host I/F 
with the U.C/DSP has been talked about. One area that has also gone through some 
integration is the memory on the drive. Ideally HDD designers would like to have 
one large memory buffer. However, they then mn into performance bottlenecks be- 
cause there is only one path into this memory that must be shared by many sources. 
Given that the drive performance is also increasing, the problem becomes even 
more serious. Below the problem is discussed in detail and we present a solution to 
the problem using the Video RAM technology as a basis. 

Problem Description: 

"Today" a typical HDD block diagram looks like this: 



M6TQ6 



Buffer 
Memory 
DRAW 
SRAM 



Host- 



Data 
Memory 
DRAM/ 
SRAM 



Host 
l/F 



ASIC 



R/W Channel 



uC 



R/WHead 



DSP 



* Actuator/Motor 



Figure 1. 
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The buffer memory holds incoming or outgoing blocks of data to be written 
or read into the disk platter. The data memory would ideally hold the |lC (or 
DSP) program and data structures. The |i.C would be masked with a boot program 
that would download the SCSI/AT code from the disk platter into the data 
buffer. 

Ideally, HDD manufacturers would like to merge the ilC and DSP functional- 
ity and the buffer and data memories. There are many other possible partitions, 
but the above are the most popular and would reduce system cost significantly. 

A TI DSP makes a good solution for the DSP/|lC integration. 

What the HDD designers are running into is performance bottlenecks when 
you merge the buffer/data memory blocks. 

The merged buffer/data memory block would have three sources trying to 
read and write data via a single port. As data throughput increases, the arbitrator 
(built into the ASIC) must prioritize access. Eventually this will limit the data 
throughput and hence the performance of the HDD. HDD manufacturers are cur- 
rently hitting these limits. 

VRAM Solution: 

A Video RAM is the perfect solution for these data bottlenecks. For instance, a 
triple-port VRAM (depicted below), would greatly enhance the data throughput: 



Triple-Port VRAM 


— Host l/F « • 


Serial Buffer 






Serial Buffer 


1 R/W Channel « * 


DRAM 
Cell 












DSP/nC 





Figure 2. 



The HOST I/F and the R/W channel can read and write data to the serial buff- 
ers at very high speeds without affecting the DSP/nC access to the data in the 
DRAM cells. When the serial buffer is full, the DSP/|iC can transfer a block of 
data to the DRAM cell in a few cycles. The DSP/jlC can therefore execute pro- 
gram from the VRAM without sacrificing performance. 
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Using the RBIT on the TMS320E25 

Contributed by Keith Larson 



Design Problem How does the TMS320E25 RBIT work? 

Solution The RBIT primarily functions by disconnecting the internal program memory bus 

(PBUS) from the MUX which combines the internal data bus (DBUS) to create the 
externally shared program/data bus. The disconnect is made at the MUX and the in- 
ternal nodes are left floating. 

This diagram shows the location of the RBIT switch disconnecting the external 
and internal program spaces. The multiplier is shown as if it were receiving data as 
from a MAC instruction which will be discussed later. 




Figure 1. 



What does this mean? On the TMS320E25 some instructions may appear to 
work and others will not. It all depends on whether or not an external data transfer 
from program or data space needs to be connected up to the internal program bus 
(PBUS). For instance, TBLW, BLKP, and other related mnemonics may appear to 
work when they are used to transfer external program memory to the internal data 
space connected to DBUS. You can probably quickly see that a transfer from the in- 
ternal program space to the external data bus will not work. This also disallows any 
external code to be executed. This is what RBIT is supposed to do i.e., protect your 
code. 
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Other Points to Consider: 

Note how the open switch disconnects the PBUS from the outside program 
space. This is why the uP mode will not work for a TMS320E25 after the RBIT is 
set. It also means that you cannot supplement your application with additional 
external code. 

Secondly, the MAC instructions will not work with external program co- 
efficients. In this case, the MAC instructions are supplying the on-chip multiplier 
with one operand from the DBUS and the other from the PBUS. The problem is 
that the external program space needs to be connected to the PBUS and the 
RBIT switch is in the way. To solve the problem, the coefficients should be 
moved to internal program RAM block BO or read directly from the EPROM. 

The RBIT also disables the EPROM programming mode, essentially disallow- 
ing an external EPROM programmer from reading the EPROM contents. It is 
therefore impossible to verify the EPROM contents once the RBIT has been set. 

On the TMS320Elx devices the RBIT works by logically disabling the nC/jlP 
pin and the EPROM programming mode. On the TMS320E25 this would not 
have worked since any opcode fetch from beyond the 4K boundary would con- 
stitute a breach of security. That is, a simple branch to an external debug routine 
would be all that is needed to get to the internal EPROM code. On the 
TMS320Elx devices, the entire program has to be on chip so nothing extra needs 
to be done. 

Conclusion 

The RBIT is a code integrity and security feature. Using a TMS320E25 with the 
RBIT set requires familiarity with the rules cited above. 
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Addressing Peripherals as Data Structures in C 

Contributed by Nat Seshan and Mark Utter 



Design Problem How can I manipulate a DSP's device peripheral-specific registers in C? 

A data structure is usually assigned to .bss by the C compiler. A peripheral such as a 
serial port has control registers with an address different from .bss. The problem is to 
connect the two. 



Method 1: Use a pointer to the peripheral. 

Pointer — 



Address =0x8000 



Peripheral as memory locations 

1.1 First declare a structure that logically represents the memory locations of the 
peripheral. 

struct controller { 

unsigned int status; 

}; 

1.2 Declare a pointer to the structure and initialize it to the peripheral's address, 
struct controller *IFperipheral = (struct controller *) 0x8000; 

1 .3 In your code, access the peripheral's memory values indirectly. 
IFperipheral -> status = 0; 
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Method 2: Placing the structure in its own section. 

2 . 1 Declare a peripheral instead of a pointer, 
struct controller IFperiph; 

2.2 Use inline assembly to give the structure its own section. 

asm("_IFperiph .usect \"periph\", 128); /* 128 is size of 

struct */ 

This creates a user-defined section that can be linked to any address. 

2.3 Use your linker command file to map the section to memory, 
periph: load = 0x8000 

2.4 Address the structure elements directly. 
IFperiph. status = 0; 

Both methods work. Sometimes the pointer method is most efficient. Other 
times, the second method is best. Method 1 is very useful for addressing peripheral 
or memory buffers which are device specific. Method 2 is preferred for addressing 
peripherals or memory buffers which are not device specific (i.e., peripherals are 
user specified). This method ensures the task of mapping and aligning user-specific 
peripherals and/or memory buffers to the linker. The choice depends on your indi- 
vidual application. For more information, read the TMS320C30 Peripheral Run- 
time Support Library Users Guide. Also see: DSP Applications Using the 'C30 EVM, 
"C Coding Tips for Application-Specific Processors." 



69 



TMS320 DSP 

designer's 
Notebook 



Number 31 



Texas 
Instruments 



Interrupts in C on the TMS320C3x 

Contributed by Tim Grady 



Design Pntlem How do I use interrupts from C? 



There are several parts to this problem: (1) writing the ISR, (2) initializing the inter- 
rupt vector table, and (3) linking the parts together in the linker command file. 

A C Language ISR 

The C compiler requires that each ISR be named as follows: 

void c_intOn (void) /* n is the int number */ 
{ 

/* a C function that is an ISR */ 

} 

The interrupt may not return a value and has no arguments. The C compiler 
recognizes this naming convention and treats it as a normal ISR, which means it 
performs a context save where needed and returns from the routine via a RETI 
instruction. 

A good practice is to include the interrupts in a separate file called ints.c or some- 
thing similar. This makes for a more modular style, simpler maintenance, and easier 
to understand software. 

The Interrupt Vector Table 

The first 40h addresses are reserved for the interrupt and trap vectors. Address 
(zero) holds the address of the reset routine. If using C linker options, the RTS30.1ib 
function 'boot.asm' takes care of defining the reset function, but the vector table 
initialization is left to the user. You can do so with either C or assembly language. 
An assembly language routine might look like this. 



; file name 


is vectors . a; 


sm 


; . sect 


"vectors" 


; a new section begins here 


.word 


_c_int00 


j the address of the reset vector 


.word 


_c_int01 


; the ISR for interrupt 


.word 


_c_int02 


; the ISR for interrupt 1 


; etc . 






; end 
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This routine creates a new section which is merely a list of addresses where the 
interrupt routines can be found. It can be written in C by encapsulating each line 
in an asm statement. 

For example: 

asm(" .sect \"vectors\" *) ; 
A C function that is an ISR. 

Linking Them Together 

The linker command file provides the mechanism for including the vectors.asm 
object and the ints.c object. 

/* file name == mylink.cmd */ 
vectors . obj 
ints . obj 

The MEMORY section needs to identify the location of the int vectors. 

MEMORY 
{ 

VECTORS: origin = Oh, length = 40h 

} 

The SECTIONS section needs to map the user-defined section called 
"vectors" to the memory location. 

SECTIONS 

{ vectors : > VECTORS 

} 

Summary 

Writing interrupt routines in C is straight forward as long as you follow the sim- 
ple rules set out in this note. You must also make sure to generate the interrupt 
vector table and to provide the linker with all the necessary information to link 
the ISRs, vector table, and section names into the correct locations. 

Clearly there are variations on this theme. Some ISRs can be written in C and 
some in assembly so long as the naming conventions and vector tables are 
followed and initialized. 
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TMS320C40 Emulator Tips 



Contributed by Rosemane Piedra 



Design Problem What special precautions need to be taken when working with the TMS320C40 
emulator? 



Solution 



General Issues 



(. The debugger will break any pending CPU/DMA access that is not completed 
within a time-out period ( 1 second) during single-stepping or after an emulator 
halt. This could happen in the following situations: 

a. During DMA/CPU reads/writes from/to comm ports without comm port syn- 
chronization. In the case of the DMA, this will even cause the DMA counter 
to decrement by one. 

b. During execution of a large RPTS execution. 

c. During interlocked instructions. 

Current debugger versions (2.20 or higher) will send a "processor access time- 
out adr=xxxx" message if a read or write access doesn't complete. However, 
debugger version 2.01 or lower may not send any warning message if a read 
access is broken. This "incomplete read" can be misinterpreted as an access com- 
pletion. 

The adr=xxxx provided in the message will "approximately" correspond to 
the address where the time-out occurs when this is the result of a "debugger" 
access time-out (for example from displaying memory). In the case of a CPU/ 
DMA time-out, what you probably will receive is an address - Oxaaaababe 
that is not meaningful. In this case, the previous three to four instructions to 
the PC value should give you an indication of where the time-out occurs. 

2. The 'C40 debugger (version 2.01 or lower) executes a "double read" when accessing 
memory locations (including on-chip memory and memory-mapped peripherals). 
Debugger version 2.20 restricts this problem to memory locations 0x0 to OxOxfff. 
If you have external FIFOs mapped into this region, be careful when using any 
emulator command that displays/reads those locations. 

3. Remember, when you "quit" a debugger session, the processor will continue 
halted (by the JTAG circuit) unless a "runf ' command has been issued before 
the "quit" command. Another way to "unhalt" the 'C40 is to issue an "emurst" 
from the DOS/OS2 command prompt. 

4. File sharing: 

When using the OS2 emulator with C programs, you cannot edit in another win- 
dow the C source file that is currently being used for the debugger. The debugger 
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"locks" access to the last five C-source files displayed in the file window during 
the current debugger session. 

To overcome this limitation without quitting the debugger, you can create an 
alias "free" command that will release ownership of the previous files: 

alias free, "file dummy. c; file dummy. c; file dummy. c; file 

dummy. c; file dummy. c" 

This is not an issue in the SUN or VAX platforms. 
Comm Ports 

Comm port logic stops when the emulator is halting the device (in a breakpoint 
or between single steps). However, you can still work with the comm ports under 
emulator single-stepping but the following precautions should be taken: 

1 . Single-stepping through comm port transfers: 

If the receiving side is halted by the emulator, the comm port logic will receive 
one word into the IFIFO and stop after that. Each assembly single-stepping 
will produce one word being received. 

If the sending side is halted by the emulator, the comm port will send one com- 
plete word if you single step one more time after the instruction that writes 
into the OFIFO. Without this additional single stepping after the "store" to 
OFIFO instruction, the emulator will break any pending comm port byte trans- 
fer and the receiver end could get only one of the four bytes, causing the 
comm port byte counter to go out out of sync. 

Summarizing, for comm port transfer, the following sequence should work: 
step (sender) - step (sender) - step (receiver). 

2. Don't display comm port FIFOs on any debugger window. The debugger in 
fact will issue a read/write from/to the IFIFO/OFIFO and the data will be gone. 

3. In a multiprocessing distributed-memory system, avoid issuing a "reset" com- 
mand in an individual debugger window. A 'C40 debugger "reset" command 
in fact resets the device and changes the direction of 'C40 comm ports to their 
status after reset. This could create bus contention with the comm port con- 
nected on the other end that could potentially damage 'C40 comm port drivers. 
The safest way to "reset" the multiprocessing board without quitting the debug- 
ger sessions is to "run free," do a hardware reset to the entire board, and then 
"halt." 

Interrupts 

1 . Interrupts are disabled during assembly single-stepping. This feature is used to 
avoid receiving an interrupt after every single-step with real-time external in- 
terrupts. If you wish to take interrupts during single-stepping, use the "run 1" 
command that is totally equivalent to the single-stepping key except that it 
doesn't disable interrupts. Interrupts are always enabled during C 
single-stepping. 

2. You can manually reproduce an external interrupt by setting the IIF bits as 
follows: 

• select pin as an interrupt pin (FUNCx=l), 

• select edge trigger interrupt (TYPEx=0): Level trigger will not work! 

• enable external interrupt (EIIOFx=l), 

• assert external interrupt (FLAGx=l), 

• enable GIE bit. 



73 



TMS320 DSP 

designer's 
Notebook 



Number 33 



Texas 
Instruments 



Floating-Point C 

Contributed by Karen Baldwin 



-Parti 



Design Problem 



What are some of the tricks of the masters for making the most out of the C 
Compiler! 

1, Solving the 'C40 Discontinuity Issue with Indirect Calls 

The 'C40 has only relative jumps. Therefore PC discontinuities using direct-mode 
addressing are limited to a 24-bit range. This, coupled with the 'C40 memory map, 
makes it impossible to directly call a routine in on-chip memory because the com- 
piler uses the direct form of CALL for calls to named functions. (The BR instruction 
is never used: all branches except returns must be within a single function, so the 
short conditional form is used.) 

You can use indirect calls to call functions anywhere in the address space. You do 
this by declaring a pointer to a function and then calling via the pointer. 

Listing 1 : Indirect Function Call 

int £(); /* function that resides in internal mem */ 

int (*ptr_to_f) () = f; /* pointer to function f */ 



main( ) 
{ 

(*ptr_to_f) 0; 

) 



/* call function f indirectly */ 



2. Making use of Relocatable C code 

You can specify different load addresses and run addresses for a section in the linker 
command file. But you have to write your own loader to move the section to the run 
address. If using assembly language you can use .label statement to get the load 
address. Here is an example: 



Listing 2: Naming Load and Run Addresses 

.global sec_start ; 
.global sec_end 
.global l_sec_start ; 
.global l_sec_end 
sec_start: .label l_sec_start 



program code goes here 



run address 



load address 



l_sec_start will contain the 
load address (sec_start = run 
address ) 
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sec_end: .label l_sec_end ; same explanation 

Your loader makes use of these labels to write from l_sec_start to sec_start and 
so on. The loader is a loop that copies the code from one location to the other. 
The C version is shown in listing 3. 

Listing 3: A Homemade Loader in C 

/* also asm(« with .global */ 

asm("sec_start .label l_sec_starf ) 

/* program code goes here */ 
void func(a,b,c) 
<local variable declaration^ 
} 

asm(~sec_end .label l_sec_end") 
3. Making a C Function Part of a Different Section 

The C compiler does not directly specify a section name for the executable assem- 
bly code that it generates. It relies on the assembler defaulting the section name 
to .text. However, it is possible to relocate the executable code from a function 
into a user-defined section from within the C source. 

This is accomplished by placing an asm statement that declares the new sec- 
tion before the actual function definition. The example below uses a macro defi- 
nition to provide a general solution to defining named sections. 

♦define sect (a) asm(" .sect "#a) 

sectCpp") ; /* Creates .sect "pp" in */ 

/* asm code */ 

void func() { ) 

The section name will remain "pp" until changed. If other functions follow 
this function in the file and their code is not to be included within this section, 
then reset the section name before the next function definition. 



/* same as assembly */ 
/* version */ 
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Guidelines for Using Decoupling Capacitors on DSP Designs 

Contributed by Ralph Weir and Gene Front? 



Design Problem What are some guidelines for selecting decoupling capacitors for TMS320C3x 
devices? 

Solution On page 13-14 of the TMS320C3x Users Guide, there is a note that recommends 
using 0.1 u.F decoupling capacitors on the V<y pins of the TMS320C31. Here we 
will provide tips on the number and types of capacitors you should use. 

1 ) On a multilayer PCB with separate V cc /V<y planes, half a dozen 0. 1 |iF will be 
ample. This assumes that other devices (e.g., memories) are decoupled ade- 
quately. Ceramics are the right choice. The real rule is to find capacitors with no 
inductive component. Electrolytics, for example, have a significant amount of 
inductance due to their manufacturing method. That is why in many switching 
regulators you find a large electrolytic in parallel with a small ceramic capacitor. 

Note: You can use capacitors in the range of 0.01 u.F to 0.2 uT. The purpose 
is to eliminate noise spikes. You may need to tune to "circuit" for a specific noise 
component. 

2) Put the "holes" necessary to bypass every V<y pin on the board such that the 
capacitors can be auto attached. This not only provides the places for the capaci- 
tors, but allows for repair procedures in the future when a noise problem occurs. 
Don't populate all of the locations for cost reasons, but have the vacant locations 
there if needed. A corollary is to put the pads down for bypass capacitors on 
other signals that might need them. Actually, if well thought out, other compo- 
nents besides capacitors can be placed in these locations. For example "bypass 
capacitors" on address, data, and control lines. Later these could be used for many 
other things. The simple rule is that PCBs are expensive to modify or repair. 

3 ) If a board has DRAM on it, make sure that you have slightly more 0.1 JJ.F ceram- 
ics, but add a couple of bigger tants - e.g., 10 \iF. This is because a DRAM 
refresh cycle eats juice and can cause the whole board's power rails to drop. 

4) TMS320C31s will run on a double-layer PCB. While not the best way, if you 
want to try you MUST decouple every single power pin, and add a 10 (iF tant 
nearby as well. Also make sure that the power tracks are substantial. 

5) If a board has problems after following these guidelines, try decoupling with addi- 
tional mylar capacitors. These have a great frequency response but will add to 
the power consumption. 
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TMS320C5x Interrupts and the Pipeline 

Contributed by Mansoor Chishtie 



Design Problem If an interrupt is pending and enabled, what will happen when the TMS320C5x re- 
turns from the pending ISR? Will the next two instructions execute followed by a 
RET or will it fall through? 

Given the following situation: 



; an int is enabled and pending 



CLRC INTO. 
RETD 

LAR ARl,#0800h 

MAR *,AR1 
RET 

MPY_2 LT *+ 

MPY *+ 



Enable interrupts 

This one instruction is 

guaranteed to execute 
after the CLRC. 

Return occurs here 



A delayed-instruction and the next two instruction words (in the two delay slots) 
are uninterruptible. If the delayed instruction happens to be a return-delayed then 
'C5x will execute the two instructions following that, execute return and then take 
the interrupt trap. 

Since 'clrc intm' would not enable interrupts until the following instruction is 
executed, which happens to be a return-delayed, any pending interrupt will be taken 
AFTER executing the return and the two delayed instructions: 



clrc intm 
retd 

lar arl,#2 
*,arl 



; these 4 instructions uninterruptible 



calling_prog: ;<-ISR will return here 
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Latency for ISRs Written in C 

Contributed fry Alex Tessarolo 



Design Problem in some control operations, the interrupt context save time must be minimized as 
much as possible and higher-priority interrupts must be delayed as little as possible 
to minimize interrupt latency. In the standard C interrupt service routines for the 
'C2x/'C5x/'C3x/'C4x, the context save and restore functions are not fully optimized. 

The example below shows how the interrupt service context save time can be re- 
duced by 34% and the interrupt latency minimized by replacing the standard 
I$$SAVE and I$$REST context save/restore functions with optimized versions. The 
'C2X is shown as an example, but the same concepts may be applied to other DSPs 
in the TI family. 



Example: 

For this example, we assume external INT1 is the highest priority interrupt 
and INT2 is a lower priority interrupt. We require that INT2 disable inter- 
rupts and enable INT1 to occur with a minimum interrupt latency: 

The following is the interrupt vector file: 

.ref _c_intO, IntlCTXT, Int2CTXT 



SP .set AR1 

.sect "vectors* 
RESET b _c_intO ; External Reset . 

INT1 b IntlCTXT, *,SP ; External H/W Interrupt 1. 
INT2 b Int2CTXT, *,SP i External H/W Interrupt 2. 



Figure 1. 
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The following is an assembly language file with the interrupt service context save and restore functions for 
INT1 and INT2: 

.ref _Intl, _Int2 

.def IntlCTXT, Int2CTXT 

SP . set AR1 
.text 

; External H/W Interrupt 1, ISR context save/restore: 

; Benchmark = 46 cycles (includes branch, call & ret, ret) 

; Notes: - Interrupts disabled for duration of ISR. 

- The above benchmark is 24 cycles (34%) faster than 
the standard I$$SAVE, I$$REST functions. 



Tn t~ 1 C*VyCT • 

~l-i.ll - 




* AQ^iimf^fi ARP — > KP 


mar 


*+ 


; Increment stack pointer . 


sstl 


*+ 


; Save ST1 . 


sst 


*+ 




sacl 


*+ 




ScLCh 


*+ 


• Save ACCH 


popu 


*+ 


Cavp fnn t"wn 1 pvpI ^ nf 




*+ 


; H/W stack only. 


spin 







sph 


*+ 


; Save PH. 


spl 


*+ 


,- Save PL. 


mpyk 


1 


; Save T. 


spl 


*+ 




sar 


AR2, * + 


; Save aux registers that 


sar 


AR3,* + 


; are not saved by C compiler 


sar 


AR4 , *+ 




sar 


AR5,*+ 




call 


_Intl 


; Call C ISR. 


mar 


*_ 


; Decrement stack ptr. 


lar 


AR5 , *- 


; Restore aux registers. 


lar 


AR4, *- 




lar 


AR3 , *- 




lar 


AR2 , *- 




lacl 


*_ 


; Temp save T in ACCL. 


It 


*_ 


,- Restore PL. 


mpyk 


1 




lph 


* 


; Restore PH. 


sacl 


* 


; Restore T. 


It 


*_ 




pshd 


*_ 


; Restore two levels of H/W 


pshd 


*_ 


; stack only. 


lace 


*-,16 


; Restore ACCH. 


adds 


*_ 


; Restore ACCL. 


1st 


*_ 


; Restore STO. 


lstl 


*_ 


; Restore ST1. 


eint 




; Global interrupt enable. 


ret 




; Return to interrupted code. 



Figure 1. (continued) 
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External H/W Interrupt 2, ISR context save/restore: 

Benchmark = 61 cycles (includes branch, call & ret, ret) ; 
Notes: Higher priority interrupts disabled for 20 cycles. 



Int2CTXT: 




Assumed ARP -> SP. 


mar 


*+ 


Increment stack pointer. 


sstl 


*+ 


Save ST1. 


sst 


*+ 


Save ST0. 


sacl 


*+ 


Save ACCL. 


sach 


*+ 


Save ACCH . 


ldpk 





DP -> 0. 


pshd 


IMR 


Save IMR. 


lack 


00000001b 


Set mask to enable INT1. 


and 


IMR 


Mask with IMR. 


sacl 


IMR 


Set IMR. 


rptk 


3 


Save top four levels of 


popd 


*+ 


H/W stack only. 


eint 




Global interrupt enable. 


spm 







sph 


*+ 


Save PH. 


spl 


*+ 


Save PL. 


mpyk 


1 


Save T. 


spl 


*+ 




sar 


AR2,* + 


Save aux registers that 


sar 


AR3,*+ ; are not saved by C compiler 


sar 


AR4,*+ 




sar 


AR5,*+ 




call 


_Int2 


Call C ISR. 


mar 


*_ 


Decrement stack ptr. 


lar 


AR5 , *- 


Restore aux registers. 


lar 


AR4, *- 




lar 


AR3, *- 




lar 


AR2 , *- 




lacl 


*_ 


Temp save T in ACCL. 


It 


*- i Restore PL. 


mpyk 


1 




lph 


* 


Restore PH. 


sacl 


* 


Restore T. 


It 


*_ 




dint 




Global interrupt disable. 


rptk 


3 


Restore four levels of H/W 


pshd 


*_ 


stack only. 


ldpk 





DP -> 0. 


popd 


*_ 


Restore IMR. 


lace 


*-,16 


Restore ACCH. 


adds 


*_ 


Restore ACCL. 


1st 


*_ 


Restore ST0. 


lstl 


*_ 


Restore ST1. 


eint 




Global interrupt enable. 


ret 




Return to interrupted code. 



Figure 1. (continued) 
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The following is an example C file with the interrupt service routines: 
void main( void ) 

/* user code */ 
void Intl( void ) 



/* INT1 Interrupt service code */ 



void Int2 ( void ) 

/* INT2 Incerrupt service code */ 



Figure 1. (continuedj 



Similar techniques can be applied to the 'C5x, 'C3x, and 'C4x compilers to im- 
prove ISR performance in C. 
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Serial ROM Boot 

Contributed by Alex Tessarolo 



How do I use a serial ROM for minimum form factor storage of boot code? 



With ever increasing speeds of DSP devices such as the 'C5x, and the increasing de- 
mand for reduced form factors such as in HDD applications, there is a need for inte- 
grating the program memory into the DSP device. For volume production whereby 
the program code has been fully debugged and changes are not foreseen, then 
masked ROM is appropriate. However, for prototyping and preproduction evalu- 
ation, a programmable device is desirable, i.e., a DSP with EPROM memory. How- 
ever, EPROM technology has not kept pace with the speed requirements of a DSP 
and hence other solutions need to be found in the interim. One solution is to popu- 
late the DSP with internal RAM that can be programmed at boot time from an ex- 
ternal source. This external source can be either a coprocessor or a smaller form 
factor ROM. To meet the need of minimum board area, a serial ROM is an ideal de- 
vice for such applications. 

Serial ROM Description 

A serial ROM is a memory device that is addressed sequentially one bit at a time. 
Data within the ROM cannot be accessed randomly. An internal address generator 
points to a single data bit and is automatically incremented by an external clock sig- 
nal. The address generator is reset by a separate external signal to begin a new trans- 
fer. Typically data can be clocked out at a rate of 5 MBits/second on such devices. 

A serial ROM can be either a one-time programmable (OTP) or electrically-eras- 
able and programmable (EEPROM) device. The serial ROMs described in this appli- 
cation note are manufactured by Xilinx and are OTP devices with capacities ranging 
from 36,288 bits up to 131,072 bits. Larger devices are in the pipeline. With a capac- 
ity of 13 IK bits, up to 8K words ( 1 word = 16 bits) of DSP program or data can be 
stored in such a device. For example, a serial ROM interfaced to a DSP device such 
as the TMS320C53 with 4K words of internal program/data RAM could be pro- 
grammed to run a fairly sizable program internally. The program would be trans- 
ferred by a boot loader program which resides in masked memory within the DSP 
and is initiated at power up or reset. 

Serial ROMs typically come in 8-pin DIP or SOIC packages (20-pin PLCC also 
available) and therefore have a small form factor. Pinouts for such a device are 
shown below: 



— 



Serial ROM/DSP Connection 

Only three connections are needed between the serial ROM and the DSP. The 
FSX signal is used as the reset input to the ROM. The BIO input is used as the 
data sampling input signal and the clock line is driven by the FX output. All of 
the above signals will be under software control. 

A typical software kernel written for either the 'C2x or 'C5x DSP that will 
transfer the contents of the serial ROM to the DSP memory is described below: 



lrlk 


AR2, #dest_addr 


AR2 = destination address pointer. 


LRLK 


AR3 , #no_of_words 


AR3 = number of words to transfer. 


STXM 




FSX configured as an output. Serial RCM enabled. 


LARP 


AR1 




Outer_Loop : 


ZAC 






LARK AR1,#16 


AR1 = bit counter. Initialized to 16 bits. 


Inner_Loop : 


SXF 


Drive FX = CLK high. 




SFL 


Shift ACC left 1 bit. 




RXF 


Drive FX= CLK low. 




BIOZ Loop_Inner, *- 


Branch if bit = and decrement bit counter (AR1) . 




ADDK #1 


Bit must be = 1. 




BANZ Loop_Inner, * 


Branch if bit counter (AR1) does not = 0. 




LARP AR2 


One word read from serial RCM. 




SACL *+,0,AR3 


Store word in destination address (AR2 ) . 




BANZ Outer_Loop,*-,AR2 


Loop until all words transferred. 



Figure 1. 



The above program running from a 50-ns 'C5x DSP can transfer 8K words of data 
from a 128K x 1-bit serial ROM in about 50 ms. This is an effective data transfer 
rate of approximately 2.5 MBits/sec. 

Serial ROM Data Format 

A typical data storage format inside the serial ROM is shown below. The first 
word is a serial ROM detect sequence for the boot loader program. The following 
words contain the data to be transferred. Multiple blocks can be transferred to 
various destination addresses. The block size and destination address are found in 
the first two words of each block. The data words then follow and at the end of 
the block there is a 32-bit check-sum. The check-sum is the addition of all the 
data words within the block (excluding block size and destination address). The 
next block to transfer, immediately follows the check-sum. If the first word read is 
a zero, then there are no more blocks to follow. 
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Mastering the 'C4x DMA 

Contributed by Rosemarie Piedra 



Design Problems What are the basic differences between the 'C3x and 'C4x DMA? 

What to do if the DMA is slower than expected or never finishes? 

How to program the 'C4x DMA? 

Examples? 

Solution The 'C4x DMA is one of the most powerful DMAs available in the market. It gives 
features not available in traditional DSPs. 

Basic Differences Between 'C3x and 'C4x DMAs 

'C3x and 'C4x DMAs are functionally similar. The 'C4x adds the following features: 

1. More DMA channels ( 1 for the 'C3x vs. 6/12 for the 'C4x). 

2. The 'C4x DMA is faster: the 'C3x DMA requires one cycle of internal register 
setup time when the DMA is reading from external memory. 

The 'C4x DMA doesn't require this extra cycle. 

3. The 'C4x DMA has more features than the 'C3x DMA, such as autoinitialization 
mode, bit-reversed addressing, and split mode. 

4. The 'C4x DMA allows you to control the priority between CPU and DMA. 
The 'C3x DMA has always lower priority than the CPU. 

5. The 'C4x DMA interrupts are totally independent of the CPU interrupts. The 
'C4x DMA doesn't require an instruction fetch boundary to acknowledge the 
interrupt. The 'C3x DMA detects DMA interrupts in fetch boundaries. 

What to do if the DMA is Slower Than Expected or Never Finishes? 
The maximum sustained data transfer rate of the 'C40 DMA is one word every two 
cycles (50 MB/s for a 50-MHz 'C4x), provided a 0-wait-state memory and a DMA 
with higher priority over the CPU (or if there are no conflicts). Memory (0-wait- 
state) with no conflicts is 50 MB/s. If DMA reads and writes to external memory are 
interleaved, the maximum sustained rate is 33 MB/s (one word transfered every 
three cycles: one for read and two for write). Arbitration between DMA channels 
does not impose any overhead cycles. 
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The following factors may slow down or even stop the DMA: 
t. Contention with the CPU: Even though DMA has its own internal buses, 
CPU/DMA memory access conflicts may exist. You can avoid that by allocat- 
ing DMA src and destination addresses in buses that the CPU is not using at 
that time. The 'C4x offers two external buses and a dual-access on-chip RAM. 
Study carefully the 'C4x block-diagram (Figure 2-1 in the 'C4x Users Guide, 
1993) to discover possible contentions. 

A double-buffering scheme with two data buffers in the system (one for CPU 
processing and one for DMA transfer) being switched between CPU and 
DMA may help in some applications. An example of this can be found in "Par- 
allel 2-D FFT Implementation with TMS320C4x FFTs" (SPRA031). If con- 
tention with the CPU cannot be avoided, select the DMA priority (bits 0,1 in 
DMA control register) more convenient to your application. 

2. Src and/or destination addreses are not ready. 
This may be caused by: 

• a non-zero-wait-state memory. Remember, after reset the default value for 
external wait states is 7, therefore you have to set the global and local 
memory-control registers to your specific settings. 

• an interrupt not being received if using DMA read and/or write synchroni- 
zation. For example, reading (writing) with sync mode from/to the comm 
ports can only take place when there is data in the input FIFO (or when 
there is space in the output FIFO). 

3. The DMA data transfer rate is slower in the sync transfer mode because it 
takes two cycles to reset the request from the interrupt. Therefore the maxi- 
mum transfer rate in the sync mode is one word every four cycles. However, 
these two extra cycles can be absorbed if multiple DMA are running at the 
same time and most of the time the effect is neglictible. Refer to section 9.1 1.2 
of the 'C4x Users Guide, 1993. 

If none of two first factors explain why a DMA transfer never finishes, take a 
look at the DMA registers values. Wrong values in any of the nine DMA registers 
may indicate a programming error. Take a special look at the DMA control regis- 
ter, start and status bits to detect if the DMA has been halted. 

Programming the 'C4x DMA? 

The DMA is a memory-mapped peripheral. Therefore you can program it from C 
as well as from Assembly in a very easy way. The following examples are provided 
in this application note: 
*** Unified Mode DMA *** 

1. Example 1: Unified-mode DMA transfers data between comm ports using 
read sync. 

2. Example 2: Unified-mode DMA uses autoinitialization (method 1) to transfer 
two data blocks. 

3. Example 3: Unified-mode DMA uses autoinitialization (method 2) to transfer 
two data blocks. 

*** Split Mode DMA *** 

4- Example 4: Split-mode auxiliary DMA transfers data between comm ports us- 
ing read sync. 

5. Example 5: Split-mode auxiliary and primary channel send/receive data to and 
from comm ports. 
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6. Example 6: Split-mode DMA autoinitializes both auxiliary and primary channels 
(auxiliary transfers one block and primary transfers two blocks). 

You can compile those examples by typing: 

goex examplel 

(this invokes a batch file that runs the compiler and linker). Source code and batch 
files can be downloded from the BBS (filename: C4xdmaex.exe). 

You can find examples of DMA programming in Assembly language in the 'C4x 
Users Guide (Chapter 12). Also, you can use a C-callable Assembly routine to 
achieve the same result. Refer to set_dma.asm routine in [1] for source code. Here is 
an example of how it can be invoked (use register for parameter passing to reduce in- 
struction cycles): 

set_dma (DMAADDR, CTRLRBG, SRC , SRC_IDX , COUNTER , DST , DST_IDX, LINK_PTR) ; 
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Bootload of C Code for the TMS320C5x 



Contributed by Jason Chyan 



Problem How can I generate my boot code with C? 



Solution Use the -c (not -cr) option in the linker and build a single section that includes 
the . text , . cinit , . bss , etc., sections that you want to be in the boot code. 
Then use DSPHEX to convert this single section into boot code. Following is an ex- 
ample linker command fde to link several sections and . cinit into one output section. 



-o filename. out 
-m filename. map 
filename . obj 
-stack 64 
-1 rts50.1ib 
-1 flib50.1ib 



PAGE 0: PROG: origin = 0x0800, length = OxlaOO 
PAGE 1: DATA: origin = 0x0060, length = 0x0020 

} 

SECTIONS 
{ 

bootsect : { 

rts50. lib (.text) = 0800h 

* ( .text) 

.=e00h; 

.cinit=. ; 

*(. cinit) 

.+= 1; 

.=00f00h; 

* ( .const) 

.=01000h; 

* ( . stack) 

.=01040h; 

*(.bss) } load=0800h PAGE 
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The command file for the DSPHEX will be: 



filename. out 

-t 

-bootorg 08000h 

SECTIONS { bootsect = boot } 

The program entry point of C code is _c_intO . Therefore, in the linker 
command file the _c_intO has to be assigned the starting address. This was 
done by first line in the SECTIONS: 

rts50. lib ( .text) = 0800h 

Since the . cinit section was hidden in another section, we need to make it 
visible to the linker by 

cinit= . ; 
*(. cinit) 
.+=1; 

The commands in the SECTIONS assign a starting address to each input sec- 
tion and it is relative to the starting address of first section. This means that 
.cinit starts from 0x1600, .const starts from 0x1700, . stack starts from 
0x1800, and .bss starts from 0x1840. If you don't want to generate unused space 
in between each section, you can remove the " . =0xxxxh; " command and all 
the sections will be placed consecutively. When you link the file with the exam- 
ple linker command file above, you will get the following warning messages "out- 
put file has no . text section" and "output file has no .bss section." You can 
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How to Convert a HEX30 Output File Into a 
Linkable Assembly File 

Contributed by Gerald Capwell and Rosemarie Piedra 



Design Problem I used HEX30 to generate a 1 -section output file in PROM programmer format. I 
want to link the data contained in that file with the rest of my application. 

Solution HEX30 takes a 'C3x/'C4x COFF file and converts it into a PROM programmer (i.e., 
Intel, Motorola, and ASCII formats) file. Please use the Intel or ASCII formats 
when trying to link this data with your application. The conversion process is below: 

1. Generate a single HEX30 output file in Intel or ASCII formats. As an example, 
we use ASCII format (the HEX30 output file is called child.aO in this descrip- 
tion). After doing so, the output will be a PROM programmer file listing the data 
of the COFF file and other control characters. 

2. Now you must use the Hex-to- Assembly utilities to convert the programmer file 
to an Assembly file. The HEX2ASM.EXE is a self-unarchiving exectuable con- 
taining two utilities called ASCI2ASM.EXE and INTL2ASM.EXE. These utili- 
ties extract the data from the PROM programmer file and create an . asm file 
which contains a . sect table listing each 32-bit word of code. For more informa- 
tion about the utilities, please read the documentation located in the HEX2ASM 
archive file. An example using ASCI2ASM.EXE is shown below. 



Example: ASCI2ASM child.aO child 



section name to be assigned 
to the boot table 



name of the .asm file that the utility 



►HEX30 child output file 
The child. asm file that the utility creates contains the raw data extracted from 

first data word of child.aO w/ label 

;last data word of child.aO w/ label 



the HEX30 output file as follows: 




.sect "tablename" ; 




.global _11, _12 


_11 


.word xxxx ; 




.word yyyy 




.word zzzz 


_12 


.word wwww 




.end 
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The ASCI 2 ASM utility also allows you to set labels at the beginning and/or at 
the end of the section (see _ll and _12). The labels can be used externally and re- 
ferred to by the code to which it is linked. 

The 'C3x/'C4x linker also offers what is called "linker variables" that can be 
used in your linker command file to create labels pointing to the beginning 
and/or the end of the section. Refer to assemblerAinker Users Guide. If using the 
"linker variables" option, the linker command file should include the following 
text: 



SECTIONS 
{ 

.child: {_11 =.; 

*(. child) 

JL2 =.-!;}> RAMI 



• Note: For conversion from Intel PROM format, the utility called 
"INTL2ASM.EXE" must be used (located in archive file called 
HEX2ASM.EXE) 

3. Now you have a regular . asm file that you may assemble and link with your 
main program. 

For further clarification, the following application notes illustrate the use of 
the HEX30 and HEX2ASM utilities: 

• "Bootloading 'C4x Networks" (BBS filename = C4XNETB.EXE) 

• "Exploring 'C4x Networks" (BBS filename = EXPLORE.EXE) 

• "Hex-to-Assembly Conversion Utilities" (BBS filename = HEX2ASM.EXE) 
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Supporting External DMA Activity to Internal RAM for 
TMS320C5x Devices With the PZ Package 



Contributed by Jim Larimer 



Design Problem 



When moving to the small outline and footprint of the thin-quad-flat package 
(TQFP) for TMS320C5x devices, some functionality is removed to reduce pin 
count. This move from the 132-pin quad-flat package (PQ package) to the 100-p in 
TQFP (PZ package) removes two functional pins: interrupt acknowledge ( LACK ) 
and instruction acquisition (IAQ). Aside from its traditional function, the IAQ pin 
is also used to acknowledge the bus request (BR) signal for external DMA access to 
the single access RAM. 

How do I use the DMA capability with the thin-quad-flat package? 

All 'C5x devices with single-access RAM ('C50, 'C51, and 'C53) offer a unique fea- 
ture allowing another processor to read and write to its internal memory. The 
TMS320C51 and TMS320C53 are offered in a 132-pin quad-flat package (PQ pack- 
age) and a 100-pin TQFP (PZ package) for systems with size constraints. To use the 
DMA capability with the TQFP package, the following should be considered. 

To initiate a read or write operation to the 'C5x single-access RAM, the Host or 
Master processor requests a hold state on the DSP's external bus. When acknow- 
ledged with HOLDA, the Host can then request access to the internal bus by lower- 
ing the bus request line (BR). Unlike the hold mode, which allows the existing 
operation to complete and allows the CPU operation to continue (if status bit 
HM = 0), a BR-requested DMA always freezes the operation currently being exe- 
cuted by the CPU. Because of this, the time required to grant the access to the inter- 
nal single-access RAM is deterministic and does not vary. Access to the internal bus 
is always granted on the third cycle after the bus request signal is received. There- 
fore, the IAQ signal is not an essential signal for external DMA activity to the sin- 
gle-access RAM. The host is required to wait two 'C5x cycles after driving the bus 
request line low. The host can then safely assume that access to the internal bus has 
been granted and begin the DMA operation. 

All other signals and timing conditions required for DMA access are the same as 
those listed in the TMS320C5x Users Guide. 
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Contributed by Lawrence Wong 



Solution 



How do I implement an efficient binary search algorithm on the TMS320C5x Fam- 
ily that will take advantage of the 'C5x's capability? 

There are many ways to implement the classical binary search algorithm but very 
few would take advantage of the 'C5x's advanced architecture and instruction set. 
The following is one of the many possible examples using the TMS320C5x execut- 
ing the binary search algorithm. 

The program takes advantage of the 'C5x's capability of performing bit-reversed 
addressing to half the search after each testing and therefore freeing the accumulator 
for other tasks. Also, instead of using conditional branching to perform the testing, 
the execute conditional (XC) instruction is used, thereby saving cycles and increas- 
ing performance. 

This routine performs a binary search on an ordered table. It assumes that the 
table is ordered from low to high, where the largest number is located in the highest 
memory of the array. Modifications can be made to reverse the ordering, if necessary. 

This program also assumes that the size of the search table is some integer power 
of 2 (i.e., 2 A N where N= 1 1 in the following program). As a result, the search would 
never pass the last entry in the array. A maximum of N iterations is required to com- 
plete the search or determine that the search failed. Modifications can be made if 
the size of the array is not a power of 2. In order to do this, test conditions will have 
to be included to determine if the last entry has been passed. 

This function returns the address of the found number and it is stored in the 
ACCUMULATOR. A 0x0000 address in the ACCUMULATOR signifies that the 
search was unsuccessful. 
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.bss 


NTABLE, 800h 


; Sorted search table from low to high 


.bss 


LOOK.1 




; Search value 


. nrcuregs 








.text 
• 








• 
• 








call 
• 


bsearch 






• 
• 

bsearch 


lar AR0,#0800h 


;AR0 size of array 






*,AR0 






mar 


*BR0+,AR3 


;Half the size of the array 




lar 


AR3,#NTABLE 


;AR3 points to beginning of array 




lacl 


#11 


;RPT N Times, Size of Array is 2*N 




samm 


BRCR 


; Setup Repeat Block 




ldp 


#LO0K 






lace 


LOOK 


; Begin search 




sub 


* 


; Compare data at AR3 




bend 


nothere, LT 


; ERROR not found in this array 




rptb 


nothere- 1 






bend 


found, EQ 


; Check if found 




xc 


1,GT 


;If too low on array 




mar 


*0+,AR0 


;Jump forward 




xc 


1,LT 


; If too high on array 




mar 


*0-,ARO 


;Jump back 




mar 


*BR0+,AR3 


;Half the search space 




lace 


LOOK 






sub * 






nothere 


retd 




;Did not find value in the table 




zac 




.•return 0x0 for failed search 




nop 






found 


ldp 


#0 






apl 


#Offfeh,PMST 


.•disable block repeat bit 




retd 






lammAR3 




; return address of search 




nop 







Figure 1. 
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Contributed by Eric Wilbur 



Design Problem How is a random number generated on a TMS320C5x? 



Solution The philosophy of the term "random" (i.e., how random is random?) has been ar- 
gued for centuries. I'm sure there were probably several duels held over the years as a 
result of disagreements on this topic. Some argue that using a computer (a precise, 
logical, predictable device) to produce random numbers is quite ironic (but useful!). 
Purists would state that the only truly random event in nature is the time delay be- 
tween clicks of a Geiger counter placed near a piece of radioactive material. 

The goal of this application note is not to solve the ongoing debate over the issue 
of randomness and somehow vindicate one side or another, but to provide a fast, 
proven, useful random number generator that can be used in various fixed-point 
applications. 

Theory and Implementation 

Many algorithms exist to generate random or pseudo-random numbers. The design 
objectives of this algorithm were speed, simplicity, "good" results, and the ease of 
integrating the code into any application. Based on this criteria, a form of uniform 
deviate called the linear congruential method (introduced by D. Lehmer in 195 1) was 
used. The advantages of this method are speed, simplicity to code, and ease of use. 
However, if care is not taken in choosing the multiplier and increment values, the 
results can quickly become degenerate. This algorithm produces 65,536 unique num- 
bers and the correlation is very good. Only the LSB exhibits a repeatable partem 
every 16 calls. 

The linear congruential method has the following form: 

Rndnum(n) = (Rndnum(n-l) * MULT) + INC (mod M) 

Where: Rndnum(n) = current random number 

Rndnum (n- 1) = previous random number 

Rndnum(l) = SEED value (arbitrary constant) 

MULT = multiplier (unique constant) 

INC = increment (unique constant) 

M = modulus (word width of 'C5x = 16 bits = 64K) 

Much research has been done to identify the optimal choices for the constants 
MULT and INC. The constants used in this implementation are based on this 
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research. If changes are made to these numbers, extreme care must be taken to 
avoid degeneration. Following is a more detailed look at the algorithm and the 
numbers used: 

M: M is the modulus value and is typically defined by the word width of the 
processor. This algorithm will return a random number between and 
65,535 and is NOT internally bounded. If the user requires a min/max 
limit, this must be coded externally to this routine. The result is not 
actually divided by 65,536. The accumulator is allowed to overflow, thus 
implementing the modulus. 

SEED: The first random number in the sequence is called the seed value. This is 
an arbitrary constant between and 64K. Zero can be used, but the first 
two results of the generator will be and 1 . This is OK if the code is al- 
lowed 3 calls to "warm up" before the numbers are taken seriously. The 
number 21,845 was used in this implementation because it is !/) of the 
modulus (65,536). 

MULT: Based on random number theory, this number should be chosen such 
that the last three digits are even-2-1 (such as xx821, x421, etc.). The 
number 31,821 was used in this implementation. Caution: the generator 
is extremely sensitive to the choice of this constant! 

INC: In general, this constant can be any prime number related to M. Two 
values were actually tested in this implementation: 1 and 13,849. 
Research shows that INC should be chosen based on the following 
formula: 

INC = (j - fi x a/3~Y]x M (Using M=65,536, INC=13,849) 



I 2 "I 6 

Note: This implementation can be modified to return a 32-bit or 8-bit random 
number if necessary. For the 32-bit number, simply modify the code to execute a 
32x32 multiply instead of 16x16. Remember, your modulus is now 2 A 32. If an 8- 
bit result is desired, the low or high byte of the 16-bit result can be used. How- 
ever, randomness is not guaranteed — duplications will exist. 

The Code 



RANDOM NUMBER GENERATOR FOR THE TMS320C5X DSPs 

Title: Randl6.ASM 
Author. Eric Wilbur 

Date: October 1993 

Application: Random Seeks for Hard Disk Drive 
Target DSP: TMS320C51 

Usage: To Initialize: Call InitRandl6 

To get the next random number: Call _Randl6 

Assumptions: SXM,OVM = don't care 
SPM = (no shift) 

Input None 

Output ACCL = 16-bit random number 



Figure 1. 



9S 



MEMORY ALLOCATION 



============== 



Rndnum .usect "Variables" , 1 .-allocate space for random 

; number result 



; ; INITIALIZE CONSTANTS 



MULT 
INC 



.set 
.set 



.set 



21845 
31821 

13849 



,-arbitrary seed value (65536/3) 
.•multiplier value (last 3 
/digits are even-2-1) 
;1 and 13849 have been tested 



CODE START 



.text 



INITIALIZE RANDOM NUMBER GENERATOR - Load the SEED value 



_TnitRandl6 : 



LDP #Rndnum 
LACC #SEED 
SACL Rndnum 



ACC = SEED value 
Rndnum = 5 BB I 
return to caller 



GENERATE NEXT RANDOM NUMBER 



_Randl6 : 



CLRC 


OVM 


LDP 


#Rndnum 


LT 


Rndnum 


MPY 


#MULT 


PAC 




ADD 


#INC 


SACL 


Rndnum 



; clear overflow - implements 
;MOD 64K 

;set data page pointer 
;TReg = Rndnum 
;PReg = Rndnum * MULT 
;ACC = Preg 

;ACC = Rndnum * MULT + INC 
; store new random number 
; return to caller 



Figure 1. (continued) 
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Using a TMS320C30 Serial Port as an Asyncronous 



RS-232 Port 

Contributed by Corey Minyard, Bell Northern Research 



Design Problem Although the TMS3 20C30 serial ports were designed to be used as syncronous ports, 
they can be used as asyncronous ports with a little creative software. This applica- 
tion note describes the hardware and software to use a 'C30 serial port as an asyncro- 
nous port. 

Solution How it Works 

This design relies on the fact that received RS-232 signals always start with a "start 
bit" that is not part of the data and end with one or more "stop bits" that are also 
not part of the data. This design keeps the receiver turned off and an interrupt (also 
tied to the receive line) turned on when not receiving a character. When the inter- 
rupt goes off, this signals a start bit on the line. The code then turns the interrupt off 
and the receiver on; the data comes in as a normal 8-bit character. The stop bits as- 
sure the 'C30 has time to handle the data before the next character. 

The transmitter basically frames the data into 16-bit words, adding a start bit, 
the character to send, and the stop bits. This will result in up to 6 clock cycles (RS- 
232 clock rate) where bandwidth on the channel is "wasted." (Think of it as having 
7 stop bits. That's kind of how it works.) A more efficient (but more complicated) 
design could be done, but was not necessary for my project. Characters go in and out 
the serial port "backwards" from the RS-232 method; they must be bit-swapped to be 
correct. 

The serial port is set up as a continuous transmit normal port. Frame syncs are 
not used and internal clocks are used for the serial port timing; these are timed of 
the 'C30 clock and need to be adjusted if clock rates change. 

Hardware 

Little hardware design needs to be done to handle this, basically just wire the serial 
port to the 'C30 properly. 
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Figure 1. 



Notice that the received signal is tied to DRx (Data Receive) and also INTx 
(any of the 'C30 interrupts). DXx (Data TRANSmit) must be pulled high to 
avoid having to have the 'C30 constantly supply "1" on the line when it has noth- 
ing to transmit. The clocks and frame syncs are not used. 

Software 

The real meat of this design lies in the software. It must handle the interrupts and 
port setups and the queueing of the data. Actual code and descriptions follow in 
this article. The code ran under an operating system written by me, but the opera- 
tion of the OS routines should be obvious. 

Transmitter 

The transmitter does not do very much; just frames the data properly, waits for 
the transmitter to be free, and sends the data. The transmitter interrupt also 
drove the OS timer tick; therefore the transmitter was constantly driven with 
data even when idle. 

Receiver 

The receiver does a lot more than the transmitter; some interrupt tricks supply 
the necessary "sync to async" conversion. Normally the serial port receiver is 
turned off. An interrupt comes in (the rec_coming interrupt) when a start bit 
comes in the receiver. This will turn off the rec_coming interrupt and start the re- 
ceiver. The next 8 bits coming in the serial port should be the character desired. 
After these 8 bits are received; the recO interrupt goes off. This will handle the re- 
ceived character; turn off the receiver, and turn back on the rec_coming interrupt 
to wait for the next start bit. 



/******************************************************************************/ 

/* 

* io.c - The I/O routines and tasks to handle I/O to the C30 serial port. 
*/ 

#include "monitor. h" 
#include "debug. h" 
#include "io.h" 

Queue_Id gets_queue; 

Queue_Id rec_int_queue ; 

Queue_Id io_state; 

int rec_ready; 

static Queue_Id wait_rec_int [2] ; 
static Oueue_Id wait_rec_cmd [ 2 ] ; 

/* 

* invert_8 - swaps the bits in the 8 bit character supplied. 
*/ 

char invert_8 (inchar) 
char inchar; 

{ 

char outchar; 

outchar = 0; 

if (inchar & 0x01) 

{ 

outchar |= 0x80; 

) 

if (inchar & 0x02) 
{ 

outchar |= 0x40; 

} 

if (inchar & 0x04) 
{ 

outchar |= 0x20; 

} 

if (inchar & 0x08) 
{ 

outchar |= 0x10; 

} 

if (inchar & 0x10) 
{ 

outchar |= 0x08; 

} 

if (inchar & 0x20) 
{ 

outchar |= 0x04; 

} 

if (inchar & 0x40) 
{ 

outchar |= 0x02; 



Figure 2. 
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} 

if (inchar & 0x80) 
{ 

outchar |= 0x01; 

} 

return (outchar) ; 

} 

/* 

* The get string task. This receives request to receive strings then 

* receives them and sends the result back to the requesting task. 
*/ 

void 

gets_task ( ) 

{ 

unsigned int my_tid; 
unsigned int msg; 
unsigned int tid; 
unsigned int qid; 
unsigned int dummyl; 
unsigned int dummy2 ; 
Buffer_Id buf; 
void *bufptr; 

char outbuf [ 3 ] ; 

char *out_loc; 
unsigned int count; 
unsigned int max_size; 
int finished; 
unsigned int c; 

io_state = NODEBUG_STATE ; 

my_tid = 0; 

os_task_inquiry (Stmy_tid, NULL) ; 
os_create_queue (&gets_queue) ; 

rec_int_queue = my_tid; 

wait_rec_int [0] = rec_int_queue ; 
wait_rec_int [1] = END_QUEUE; 

wait_rec_cmd[0] = gets_queue; 
wait_rec_cmd[l] = END_QUEUE; 



/* Get my task id (and therefore my) */ 
/* main queue id. */ 

/* Create another queue for requests */ 
/* to get data. */ 

/* My main queue same as tid */ 

/* Set up queue lists for wait queues * / 



while (TRUE) 
{ 

rec_ready = FALSE; 



/* Not receiving any data here */ 



/* Wait for someone to request a string */ 
os_wait_f etch (wait_rec_cmd, &msg, Scbuf, Shufptr, &tid, &qid) ; 
if (buf != NO_BUFFER) 



Figure 2. (continued) 
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{ 

os_free_buffer (buf ) ; 

} 

rec_ready = TRUE; /* Now we are receiving data */ 

/* 

* The following is not 32-bit clean, but it doesn't matter for 

* 'C30s 
*/ 

max_size = (msg > 24) & Oxff ; /* Get the num bytes to receive */ 
out_loc = ((char *) (msg & Oxff f f f f ) ) ; /* Get the address to put */ 

/* the string in. */ 

count = 0; 
finished = FALSE; 
while ( ! finished) 
{ 

if (count = max_size) /* If all the data is in, send a msg */ 

{ /* back to the requestor */ 

*out_loc = '\0'; 

finished = TRUE; 

os_put_queue(REC_FINISHED, N0_BUFFER, tid); 

) 

else 
{ 

/* Wait for the receiver to send me some data */ 

os_wait_fetch(wait_rec_int, &c, &buf, fcbufptr, Sdummyl, &dummy2); 

if (buf != NO_BUFFER) 

{ 

os_free_buffer(buf ) ; 

} 

outbuf [0] = c; /* Put the received data into a buf */ 

outbuf [1] = '\0'; /* so it can be echoed. */ 

puts (outbuf ) ; /* Echo the data */ 

if (c == '\n') /* If a newline is received, finish */ 

{ /* the receive. */ 

*out_loc = ' \0' ; 

finished = TRUE; 

os_put_queue (REC_FINISHED, NO_BUFFER, tid); 

} 

else /* else put the character into the */ 

{ /* buffer. */ 

*OUt_loC = C; 

out_loc++; 

count++ ; 

} 



Figure 2. (continued) 
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/* 

* Receive handler. This routine is called by the interrupt handler that 

* is called when a byte is received from the com port. 
*/ 

void 

rec_hndl ( ) 
{ 

int rec_char; 
regioncount = 1; 

/* Data from RS-232 is backwards, flip it around */ 
rec_char = invert_8 ( (*RECLOC) & Oxf f ) ; 

if (rec_char == OxOd) /* Map ctrl-m to newline (No raw mode!) */ 

{ 

rec_char = '\n'; 

} 

if (io_state == DEBUG_STATE) /* If the debugger is on, send all */ 

{ /* data to it. */ 

os_put_queue ( rec_char , NO_BUFFER, debug_q) ; 

} 

else if (rec_char == 0x03) /* A ctrl-c activates the debugger. */ 

{ 

io_state = DEBUG_STATE ; 
os_start_task(debug_tid) ,- 

} 

else if (rec_ready) /* Send data to the gets task if it wants it. */ 

{ 

os_put_queue(rec_char, NO_BUFFER, rec_int_queue) ; 

} 

regioncount = 0; 

} 

#define XMTLOC ((int *) 0x808048) 

#define RECLOC ((int *) 0x80804c) 

#define XMT_PRT_CTL ((int *) 0x808042) 

Queue_Id puts_queue ; 

Queue_Id xmt_int_queue; 

static Queue_Id wait_xmt_int [2] ; 
static Queue_Id wait_xmt_cmd[2] ; 

int xmt_data; 

i* 

* The put string routine. This task will put strings out to the serial port. 

*/ 
void 

puts (string) 

char *string; 

{ 
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Task_Id my_tid; 

Queue_Id wait_f ini [ 2 ] ; 
unsigned int msg; 

Task_Id tid; 

Queue_Id qid; 

Buffer_Id buf; 

void *bufptr; 

my_tid = 0; 

os_task_inquiry(&my_tid, NULL) ; /* Get my task id. */ 

wait_fini[0] = ray_tid; /* Use ray task id as the queue to */ 

wait_fini[l] = END_QUEUE; /* receive xmit ready messages. */ 

/* Send a pointer to the string to the transmit task. */ 
os_put_queue( (unsigned int) string, N0_BUFFER, puts_queue) ; 

/* Wait for it to respond. */ 

os_wai t_f etch (wait_f ini, &msg, &buf, &bufptr, &tid, &qid) ; 

if (buf != N0_BUFFER) 

{ 

os_free_buffer (buf) ; 

} 

/* Ignore all messages that are not a send finished from the xmit task */ 

while (msg != SEMIFINISHED) 

{ 

os_wait_f etch (wait_f ini, &msg, Scbuf , Sbufptr, &tid, &qid) ; 

if (buf != NO_BUFFER) 

{ 

os_free_buffer (buf) ; 

} 

) 

} 

/* 

* The put string task. This task will wait for strings on its input queue 

* and transmit them to the serial port. 
*/ 

puts_task ( ) 
{ 

Task_Id my_tid; 
char *msg; 
Task_Id tid; 
Queue_Id qid; 

unsigned int dummyl, dummy2, dumray3; 
int newline_f lag; 

Buffer_Id buf; 
void *bufptr; 

my_tid = 0; 

os_task_inquiry(Scray_tid, NULL); /* Get my task id. */ 

os_create_queue (&puts_queue) ; /* Create a queue to get send requests*/ 



Figure 2. (continued) 
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xmt_int_queue = my_tid; /* My queue to get transmitter */ 

/* interrupt messages. */ 

wait_j<mt_int[0] = xmt_int_queue; /* Set up receive queues. */ 
wait_xmt_int[l] = END_QUEUE; 

wait_xmt_cmd[0] = puts_queue; 
wait_xmt_cmd[l] = END_QUEUE; 

newline_flag = FALSE; 

xmt_data = FALSE; 

while (TRUE) 
{ 

/* Wait for a string to transmit. */ 

os_wai t_f etch ( wai t_xmt_cmd , (unsigned int *) &msg, Stbuf, &bufptr, &tid, 

&qid) ; 

if (buf != NO_BUFFER) 

{ 

os_free_buf f er (buf ) ; 

} 

/* 

* Ok, now I am transmitting. Wait for the transmitter to tell me 

* that I can send some data. 
*/ 

xmt_data = TRUE; 

os_wait_fetch(wait_xmt_int, fcdummyl, &buf , ia>ufptr, &dumny2/ &duniny3); 

if (buf != NOJiUFFER) 

( 

os_free_buf fer (buf ) ; 

} 

/* 

* Send the whole message. Make sure to send the last new line 

* even if currently pointing to the EOS character. 
*/ 

while ((*msg != '\0') || (newline_f lag) ) 
{ 

if (newline_flag) /* If transmitting a newline, (ctrl-j), also */ 
{ /* send a cariage return (ctrl-m) . */ 

*XMTLOC = (((int) invert_8( (char)OxOd) ) & Oxfeff) | OxfeOO; 

newline_flag = FALSE; 

} 

else 
{ 

if (*msg == '\n') /* If a newline, set up to send a */ 

{ /* ctrl-m next. */ 

newline_flag = TRUE; 

) 



Figure 2. (continued) 
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/* 

* Put the character into the output buffer. The first 

* 7 bits are transmitted as 1, the next is the start bit, 

* the rest is the character. 
*/ 

*XMTLOC = (((int) invert_8 (*msg) ) & Oxfeff) | OxfeOO; 
msg++; 

} 

/* Wait for the transmitter to tell me I can send the next char */ 
os_wait_fetch(wait_xmt_int, &dummyl, &buf, fcbufptr, &dummy2, &dummy3) ; 
if (buf != NOJ3UFFER) 
{ 

os_free_buffer (buf ) ; 

} 

} 

xmt_data = FALSE; /* No longer receiveing data. */ 

*XMTLOC = Oxffff; /* Prime the transmitter to send ones. */ 

/* Inform the requestor that the send is finished. */ 
os_put_queue(SEND_FINISHED, NO_BUFFER, tid) ; 

} 

} 

/* 

* transmit interrupt handler. This routine is called whenever the transmit 

* interrupt for the serial port goes off. It continuously keeps the 

* transmitter primed because the transmit interrupt is also used as the 

* clock interrupt. 

void 

xmt_hndl ( ) 
{ 

if (xmt_data) /* If the puts task is waiting interrupt info. . . */ 

{ 

regioncount =1; /* Interrupts should already be turned off, */ 

/* set the critical region count to */ 
/* reflect that. */ 

/* 

* Send a message to the puts task to tell it to send the next 

* char. If the send fails, go ahead and prime the transmitter. 

*/ 

if (os_put_queue(0, NO_BUFFER, xmt_int_queue) != 0) 
{ 

*XMTLOC = Oxffff; 

} 

regioncount = 0; 

} 

else /* If the puts task is not sending, prime the transmitter */ 

{ 

*XMTLOC = Oxffff; 

} 

} 



Figure 2. (continued) 
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/******************************************************************************/ 


;ioasm.asm - the assembly langage support routines for I/O handling for the 






C30 serial port. 








.global xmtO 








. global 


recO 








. global 


rec_coming 








. global 


_init_io 








. global 


_rec_hndl 








. global 


_xmt_hndl 








. global 


_os_tick, save_task, restore_task 




* xmtO 


- handle an 


interrupt from the serial port transmitter. This also 




* 


calls the 


OS tick routine. 


Note that the save and restore tasks 




* 


are called because this can 


result in a task switch. 






.text 






xmtO 












CALL 


save_task 








CALL 


_xmt_hndl 








CALL 


_os_tick 








CALL 


restore_task 








RETI 






rec_ser 


_cnt .word 


80804Qh 


,- Address of serial port status register 


s_recc_ 


int . word 


000000002h 


; Mask for the receive interrupt 


c_recc_ 


int .word 


Offfffffdh 


; Inverted mask to clear the rec int. 


reset_rec .word 


0f7ffffffh 


; Mask to write a to the rec reset 


unreset 


_rec .word 


008000000h 


; Unreset the receiver. 




* recO 


- Handle an 


interrupt from the serial port receiver to inform it 






of the receipt of a byte on 


the serial port. This routine will 




* 


turn off the receiver and restore the interrupt telling it that 




* 


a byte is 


about to come. 




recO 












CALL 


save_task 








LDP 


@rec_ser_cnt , DP 








LDI 


@rec_ser_cnt , ARO 








LDI 


*+AR0(0) ,R0 


; Reset the receiver 






AND 


@reset_rec,RO 








STI 


R0,*+ARO(0) 








AND 


@c_recc_int , IF 


; clear receive coming interrupt 






OR 


@s_recc_int, IE 


; enable the interrupt for the 










; next byte 



Figure 2. (continued) 
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CALL 


_rec_hndl 




CALL 
RETI 


restore_task 




. * rec_coming - This interrupt handle is called by INT1. It is tied to the 
;* receive data line, it will be called when the start bit is received 
; * for a character of information. It will turn on the serial receiver 
;* (and the serial receiver interrupt) and turn its own interrupt off. 
rec_coming 

PUSH ST 
PUSH RO 
PUSH DP 
PUSH ARO 


LDP 
LDI 


@rec_ser_cnt , DP 
@rec_ser_cnt , ARO 




AND 


Sc_recc_int, IE 


;do not allow the receive coming 
; interrupt 


LDI 

OR 

STI 


*+AR0(0) ,R0 
eunreset_rec , RO 
R0,*+AR0(0) 


; Ready the receiver 


POP 
POP 
POP 
POP 
RETI 


ARO 
DP 
RO 
ST 




gl_prt_cnt .word 


0068400c4h 


Initial setup for the serial port 
status register. This sets the 
following things: 

FSX is output. 

Fixed data rate signalling 

Standard frame sync mode 

Internal xmit elk 

Internal rec elk 

Active high DX and DR 

XLEN - 16 bits 

RLEN - 8 bits 

Transmitter interrupt enabled 
Receive interrupt enabled 
Activate the transmitter 
Deactivate the receiver 


x_prt_cnt .word 
r_pr t_cnt . word 


OOOOOOlllh 
OOOOOOlllh 


Setup for the transmit port control 
register. Set all the transmit 
pins as serial port pins. 

Setup for the receive port control 
register. Set all the receive 
pins as serial port pins. 
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tmr_cnt 


.word 


0000003cfh 


Setup for the timer control reg. 
Starts the timer 
Free run the timer (no hold) 
Clock mode 

Internal clock source 


tmr per 


.word 


00434042Ah 


Timer periods. This is 1076 for 
the receiver, which is a little 
slow. This makes sure we don't 
shift in time before the bits. 
These also assume a 20.48MHZ clock 
in the C30; these values will have 
to be adjusted for different clock 
rates . 


enab_int 


.word 


000000032h 


Enable serial xmit, serial recieve, 
and int 1 for serial port stuff. 


first_xmt 


.word 


OOOOOffffh 




_init_io 


LDP 
LDI 


@rec_ser_cnt , DP 
@rec_ser_cnt , ARO 






LDI 
STI 


@x_prt_cnt , RO 
RO, *+AR0 (2) 


Set up the transmit control port. 




LDI 
STI 


@r_prt_cnt,RO 
RO, *+AR0 (3) 


Set up the receive control port. 




LDI 
STI 


@tmr_per , RO ; 
R0,*+AR0(6) 


Set the timer period register. 




LDI 
STI 


@tmr_cnt,R0 
R0,*+AR0(4) 


Set the timer control register. 




LDI 
STI 


@gl _prt_cnt,R0 
R0,*+AR0(0) 


Set the global serial control register. 




OR 


@enab_int , IE ,- 


Enable interrupts. 




LDI 
STI 


@first_xmt,RO 
R0,*+AR0(8) 


Start the transmitter sending I'm 




RETS 







Figure 2. (continued) 
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Fast 

Contributed by Joe George 



External Memory Interface 



Design Problem How can I connect zero-wait-state SRAM to a fast 'C5x without external decode 
logic? 

Solution Introduction 

The 'C5x provides the hardware engineer two sets of signals for external memory in- 
terface. The first is documented in Section 6 of the 'C5x User's Guide usin g RD and 
WE (Figure 6-13). These signals allow the glueless interface to the OE and WE re- 
spectively of various memories. In essence, the 'C5x supp lies the decode for you. As 
can be seen in Appendix B of the 'C5x User's Guide, the RD and WE signals change 
a half c ycle later tha n the address. But as the 'C5x gets faster, the setup and hold tim- 
ings for RD and WE as shown on A- 14, become more constrained and therefore, 
more d ifficult for a memory to satisfy. Thus the second set of signals, R/W and 
STRB, become useful for memory interface. 

As with the 'C2x and 'C3x, the R/W and STRB signals are usually decoded by ex- 
ternal logic (alternative shown later in this document). The most important timings 
for a memory to meet when using R/W and STRB is read data access from address 
valid (taA) as seen on page A-14 of the 'C5x User's Guide. If this timing is satisfied, 
then by examining Appendix B and page A-14, it can be proven by inspection that 
the write timing is satisfied. The following table summarizes ta^ for the following 
'C5x speeds. 



Name 


instruction Cycle 




TMS320C5X-40 


50 ns 


32 ns 


TMS320C5X-57 


35 ns 


20 ns 


TMS320C5X-80 


25 ns 


15 ns 



taj\ represents the amount of time required for the memory/logic to drive valid 
data once the 'C5x has generated a valid address on its external bus during a mem- 
ory access. For example the 'C5x EVM, which has a TMS320C50-40, uses 25-ns 
SRAM and 7-ns PALs to deliver valid data in 25+7=32 ns, thus satisfying the 
device's 32-ns requirement. 

However, as one approaches using a 25-ns 'C5x in a similar fashion, one finds 
that 10-ns memories and 5-ns PALs are quite expensive. Therefore it is advanta- 
geous to examine a "No Decode" memory interface option. 
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No Decode Memory Interface 

The no decode memory interface is seen on page 12-5 of the 1992 'C3x Users 
Guide. There are some additional features of the 'C5x, such as partitionable soft- 
ware wait states, which makes the no decode memory interface an attractive solu- 
tion. The memory interface still uses R/W and STRB, but in a different way. The 
basic concept is to allow the external SRAMs to be on continuously so that 0- 
wait-state operation is achieved. There is also no decode logic to inhibit bus cycle 
speed. When other devices on the DSP's bus are accessed (in I/O space or even 
other sub-64K banks of program or data space), the fast SRAMs are removed 
from the bus using the chip selects. The ' C5x's on-chip software-programmable 
wait-state generator becomes an ideal device for giving external logic enough 
time to juggle the chip selects of various devices on the external bus. 

Let us now revisit the issue of taA timing. As an example, let's take a 20-ns 
'C5x device with taA=15 ns max. Figure 1 below illustrates the no decode zero- 
wait-state connection of external memory. Since ta^ is 15 ns, a 15-ns memory 
would be sufficient to satisfy ta^- The SR AMs in Figure 1 must have a WE con- 
trollable access feature allowing the RAM OE to be tied low . The DSP memory 
strobe (STRB) is connected to the memory chip select (CS or CE ), and DSP 
read-write signal (R/W) is connected to RAM write enable (WE). Appendix B of 
the 'C5x User's Guide verifies this configuration works for all bus cycles. In order 
for other devices to hang on the DSP's external bus, the SRAM should have a 
second chip select used to remove it from the bus. This is shown below: 



TMS320C51 



STRB 
R/W 



PAL/FPGA/ASIC 
(optional) 



I — 0THER_CS 



SRAM 



JWE Controlled) 

CS2 

csi 

WE 
OE 



Figure 1. 'C5x no decode SRAM interface 



The configuration above permits two types of memory schemes when using ad- 
dress and memory strobes. Notice in Figure 1 , address lines, PS, and DS are not 
shown. The first possible configuration would use a 64K RAM (i.e., address lines 
AO to A15) where the corresponding DSP and RAM address lines connected. 
The PS and DS are left unconnected. This gives the combined program and data 
scheme (dubbed the CPD scheme) as seen in a 'C5x EVM. This means program 
and data spaces will overlap in the external RAM. There is no differentiation of 
program and data in the DSP's external memory. This allows flexible program/ 
data allocation and also makes memory paging quite easy (assuming you get 
higher-density SRAMs. If you need to cascade SRAMs, then chips selects and 
S/W wait-states need to be used (as seen in Figure 1). In either case, it is assumed 
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that the action of switching pages is not zero-wait-state). The disadvantage of CPD 
is that the address range of the DSP has been cut in half. 

The second method of no decode logic address mapping is to use 128K memories 
and connect A16 to one of the memory space strobes PS or DS. Thus the single 
bank of 128K RAMs are divided into separate 64K blocks of program and data 
(dubbed the SPD scheme). This allows full-speed operation and switching between 
two "banks" of program and data that are actually one bank of SRAM. 

A disadvantage of both these schemes is subtly apparent in the name, no decode. 
This means that sub-64K (or whatever maximum size) blocks cannot be paged, en- 
abled, etc. Also mixing CPD and SPD schemes on the same bus have disadvantages 
for paging (see Designer Note #46). The SPD scheme requires A16 to be tied to a 
DSP memory strobe. The CPD scheme leaves the memory strobes unconnected. 

The above configurations should allow a hardware designer to use the slowest, 
thus cheapest, memories possible when interfacing external memory to a fast 'C5x. 
This is becoming increasingly important as DSPs become faster. 
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TMS320C5x Memory Paging (Expanding its Address Reach) 

Contributed by ]oe George 



Design Problem How can I extend the address space of a 'C5x device? 

Solution Since a 'C5x is a 16-bit machine with a 16-bit address, memory paging is needed if 
more than 64K of memory is to be addressed in a particular space. An external de- 
vice needs to supply the upper addresses beyond the 1 6-bit memory range. This is 
done by having the DSP write a value to a register located in its I/O space whose 
data lines are the higher address bits. An example is shown below in Figure 1. 



DQFF-Type Register 



TMS320C51 



A15-A0 
D15-D0 







EN 




D3-D0 


Q3-Q0 



a 



SRAM 



A19-A16 

A15-A0 

D15-D0 



Figure 1. 'C5x paging hardware 



Since the bank switch requires some action from the DSP, frequent switching be- 
tween banks is not very efficient. It would be best to partition tasks within a bank 
and switch banks when starting new tasks. It may even be desirable to fix a certain 
part of the memory as non-pageable where the task manager would run (or use inter- 
nal memory). This task manager kernal could determine on which page a called 
function resides and swap banks accordingly. 
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FFFlT /pWLO-Ii- 



, r Raoa2.i. 



8000- 



I I 
.1 I 
• J 

l_ 



8001 
7FFI 



Pagel 



PageO 
Fixed 



64Kx16 
'C5x 
Data Memory 



Separate Program 
and Data Spaces 



FFFF 



800( 



7FFT 



_iBw!t!^"]FFFF 



I LI 
'-I 



PageO 



-'8000 



64Kx16 

'C5x 

Program Memory 



Figure 2. General DSP memory map (separate program and data) 



An example of a separate program and data scheme is shown above in Figure 2 
with a fixed 32K page in the lower half of memory and pageable 32K blocks in 
upper memory. 

Thus in software, any task function would be called through the task manager 
from the main code. For example: 

main( ) 
{ 

taskman ( taskl , parms . . . ) ; 

} 

void taskman (task, parms...) 
{ 

case task{ 

taskl : 

asm (OUT paO, BANK1) 
taskl (parms . . . ) ; 

task2: asm (OUT paO, BANK2 ) 
return ( 1 ) 

} 

taskl (parms . . . ) 
{ 

return ( 1 ) 

} 
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At present, the TMS320 debuggers do not understand paged code for various 
spaces. They only understand page for program, page 1 for data, and page 2 for I/O 
(1994 'C5x Debugger Users Guide pages 13-32 and 13-33). Any of these spaces may 
be paged by the user. The debugger, which is unaware of paging, will display the data 
values after the user reads the space. Also, if the symbol information for a new page 
is desired, then doing an "sload" of the particular file containing a maximum of 64K 
(per page) must be done. The linker, on the other hand, understands up to 256 mul- 
tiple pages (page 8-21 of Fixed-Point Assembly Tools). Thus one may combine the 
object files of each page into a single . out file using the linker. A smart loader can 
be written by the user to load the entire program and data into the specific system. 
The user must remember that the linker numbers its pages from to 255 and there- 
fore must be verified with the DSP memory map. For example data memory page 7 
may actually arbitrarily correspond to linker page 1 2. 

With these techniques, a user may extend the address reach of a 'C5x far above 
the 64K per memory space limited by the 16 address bits, allowing the use of more 
verbose code on a cheaper fixed-point platform. TI is presently in the process of 
studying the alternatives for cohesive expanded address reach support for all its tools. 
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TMS320C5X Clock Modes 

Contributed by Joe George 



Design Problem 



Please explain the 'C5x clock modes and how they relate to the internal phase lock 
loop (PLL). 

The clock options and the fully-static design of the 'C5x give the user a lot of flexi- 
bility. As can be seen in Appendix A-10 of the 'C5x User's Guide, two pins 
CLKMD1 and CLKMD2 select which clock mode in which the part is operating. 
These modes should not be changed unless the part is in reset (RS=0). A common 
mode is to run the CPU at a rate that is a divide-by-two of the input clock. The di- 
vide-by-two option and the associated CPU speeds for each 'C5x device are shown 
below. Note that the CPU speed is given in nanoseconds. Due to the various ratios 
between the clock mode input frequency to instruction cycle frequency, it makes the 
most sense to refer to the instruction cycle time or MIP rate rather than frequency. 



Divide-by-Two Mode: 



CPU 



TMS320C5X 


40 MHz (25 ns) 


50 ns (20 MIPS) 


TMS320C5X-57 


57 MHz (17.5 ns) 


35 ns (28.5 MIPS) 


TMS32OC5X-80 


80 MHz (12.5 ns) 


25 ns (40 MIPS) 



There are two ways of achieving the divide-by-two clock mode: external crystal 
or external oscillator. Option 3 (divide by two) on page A-10 in the 'C5x User's 
Guide allows both inputs. If one chooses to use an external crystal, then set 
CLKMD1 = 1, CLKMD2=1, and place the crystal across X2/CLKIN1 and XI pins 
(see page 2-6 for location and function of these pins). This is the only CLKMD op- 
tion that allows the use of the crystal in a divide-by-two manner. The internal oscil- 
lator generates a clock based on the crystal overtone. Option 3 also has the ability to 
accept an external square wave from a crystal oscillator. In this particular case, the 
CLKMD pins remain set to 1 and 1, but the X2/CLKIN1 pin is used as an input and 
XI is left unconnected. As a result, the internal oscillator is running needlessly and 
consuming power. Unless you plan on switching between a crystal and a crystal oscil- 
lator (which is pretty remote), it makes more sense to use the CLKMD option 4 
with an external crystal oscillator (i.e., CLKMD1=»0, CLKMD2=0). In this case, the 
internal oscillator is shut off and the CPU runs off the input on X2/CLKIN1 divided 
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by two. Note in both cases that the CPU clock may be varied within the speci- 
fied range all the way down to MHz, if the clock remains clean. In other words, 
if the clock is shut off (cleanly), the device will retain its state. 

As 'C5xs get faster, the external oscillators required for operation are very fast. 
High-frequency oscillators are not only expensive, but may also generate unwanted 
noise and increase power dissipation. The internal phase locked loop (PLL) helps 
arrest this problem. When CLKMD1=1 and CLKMD2=0 (PLL enabled), the 
internal PLL takes the input from the CLKIN2 pin to generate the CPU clock. 
On all devices except the 'C52, the PLL is a xl multiply. On a 'C52, the PLL is a 
frequency doubler (x2 multiply). 

PLL Mode (Multiply-by-one): 



Part Name* 


Oscillator 


CPU Speed 


TMS320C5X 


20 MHz (50 ns) 


50 ns (20 MIPS) 


TMS320C5X-57 


28.5 MHz (35 ns) 


35 ns (28.5 MIPS) 


TMS320C5X-80 


40 MHz (25 ns) 


25 ns (40 MIPS) 


•All Parts except TMS320C52 






PLL Mode (Multiply-by-two): 




Part Name 


Oscillator 


CPU Speed 


TMS320C52 


10 MHz (100 ns) 


50 ns (20 MIPS) 


TMS320C52-57 


14.25 MHz (70 ns) 


35 ns (28.5 MIPS) 


TMS320C52-80 


20 MHz (50 ns) 


25 ns (40 MIPS) 



One disadvantage of using the PLL mode is, like any PLL, it has a lock range. 
The lock range is listed on Appendix A-13 of the 'C5x User's Guide. As you can 
see, minimum frequency is not MHz. Thus if clocks need to be shut off, the de- 
vice must be put into IDLE2 mode. 

By examining all three tables, we can see the CPU speeds of a particular de- 
vice are identical, only the oscillator frequencies changed. In fact, the 'C5x's 
clock generation circuitry should be thought of as an external module to the 
CPU which sets the instruction cycle time/frequency based on the input clock 
and modes. The desired instruction cycle time (CPU speed) i.e., MIP require- 
ment, is determined and the appropriate clock mode selected. 



Clock Generation 
Circuitry (ASIC) 



RESET 



CLOCK 
CLKMD1 



Possible Clock Reduction 
40 MHz 



'C51 DSP 



RS 

X2/CLKIN1 
CLKIN2 
CLKMD1 
CLKMD2 



Figure 1. Optimal TMS320C5x 25-ns device design 



Figure 1 shows a flexible design for a 25-ns device that uses all features of the 
'C5x clock modes. 

Figure 1 shows a 25-ns 'C51 device that uses a 40-MHz oscillator. The CLKMD 
pins may be changed only while RESET is low. If placed in Option 4, then a 50-ns 
CPU speed is available with optional clock speed reduction supplied by the external 
logic. In PLL divide-by-one mode, the 'C5x operates as a 25-ns device, but must be 
in IDLE2 mode in order to shut off the clocks. Clock speed reduction must be within 
the PLL lock range. 
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TMS320C5X Wait States 

Contributed by Joe George 



What is the difference between hardware and software wait states? 



Solution The 1993 'C5x User's Guide describes how wait states are treated on the 'C5x. But 
some additional information is useful to tie it all together. 
Two types of wait states are often spoken of: 

1) Hardware wait states, 

2) Software wait states. 

H/W wait states are generated by external logic and connected to the 'C5x 
READY pin. The 'C5x polls this pin on the falling edge of CLKOUT1 as shown in 
A- 16 and A- 17. The setup and hold times shown on these pages should be followed. 
Table A- 13 gives these timings in relation to both RD/WE strobes and CLKOUT1. 
Following either set is sufficient depending on which set of memory interface signals 
are used (see Designer's Note #45). But as the note on the table describes, external 
ready is only sampled after S/W wait states are completed. 

The S/W wait states g enerator is described in Section 5.3 of the 1993 'C5x 
User's Guide. It is a very flexible on-chip peripheral that eliminates the need for ex- 
ternal wait state logic. (Note that any internal access is always wait state). 

In general, the 'C5x takes one cycle for a read and two cycles for a write. But in 
the case of READ- WRITE, WRITE-READ combinations, the write will take three 
cycles. Also, there is a subtle difference between S/W - (using on-chip S/W wait- 
state generator) and H/W- (using READY line) based wait states and their bus cy- 
cles. 

In the case of S/W wait states, "... the addition of a single wait state generated by 
the on-chip software wait-state generator only affects the read cycle ..." Thus for 
S/W wait states, the memory R/W cycle for wait state is Vi, for 1 W/S is for 2 
W/S is 3 A, for 3 W/S is %. Page 4-25 in the 'C5x User's Guide talks in detail about 
this. But since H/W wait states are done by ready line polling, the memory R/W cy- 
cle for wait states is Vi, for 1 W/S is 2 A, for 2 W/S is \ and for 3 W/S is %. In sum- 
mary: 
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No. of HAV Wait State H/W Wait State S/W Walt State S/W Wait State 



111 |4 pi I q m j II I T| n aa j ill ti 

wan states Head write Head wme 

12 12 

1 2 3 2 2 

2 3 4 3 3 

3 4 5 4 4 
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Clocking Options on the TMS320C5x 

Contributed by Eric Wilbur 



Design Problem 



There are three speed versions available on most TMS320C5x devices — 50 ns, 35 ns, 
and 25 ns. If you desire to run these devices at full speed, the clock input required is 
20 MHz, 57 MHz, and 80 MHz, respectively. In a standard configuration, the input 
clock is divided by two to get the internal machine cycle: 40 MHz/2 = 20 MHz = 50 ns. 
The CLKOUT1 pin will run at the same speed as the internal machine rate. How 
can I reduce the frequency of the input clock rate? 

The Flow of the Clock (from External to Internal) 

Let's look at the flow of clock information from external to internal. This flow will 
help us understand the options we have in configuring the pins and modes. 





< e >«> 


(Int Enabled for div-1 
orext) Disabled lor div-2 Always 






_J_X, 


~£ 3C — X~ 

Oscy -n-TLTL | PLL | | + 2 | 


•C5x 






m (int) 





The 'C5x has five pins (CLKMD1, CLKMD2, XI, X2/CLKIN, CLKIN2) that 
can be used to configure the proper mode and hook up the crystal or can oscillator. 
The state of the CLKMD pins determine the internal clock options such as whether 
the PLL is enabled or disabled and the type of divide down ratio (div 1, div 2, etc.). 

The first item you must decide is whether a crystal or can oscillator (crystal + 
oscillator) will be used in the system. Then, you must decide whether a divide-by- 
one or divide-by-two is required. Based on these decisions, the 'C5x can be properly 
configured. The following information will detail the hardware hookups and the 
state of the CLKMD pins required to properly hook up your system clock. 
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External Can Oscillator, Divide-by-2 

This is probably the most popular option. The PLL will be disabled for this mode 
because it is only used for the divide-by-1 option. The can oscillator output is 
connected directly to the X2/CLKIN pin. The only decision left is whether you 
want the internal oscillator enabled or disabled. But, wait a minute, why would 
you want the internal oscillator enabled? 

This takes power and is not necessary. True. In most cases, you will want the 
internal oscillator disabled to reduce power and it's really not necessary. How- 
ever, if you would like the option of replacing the can oscillator with a crystal, all 
you have to do is make the replacement. No other changes are necessary. The 
CLKMD values needed for this mode and operating with just an external crystal 
are the same. 



+5V 




X2/CLKIN 



GND 



Figure 2. Hardware hookup 





CLKMD2 


Comments 








jintemal oscillator disabled, lower power 






mode for external can oscillator, divide- 






by-2, PLL disabled. 


1 


1 


internal oscillator enabled, higher power 






than above, but allows swap of can 






oscillator with a crystal with no change 






in CLKMD1/2 values. 



Figure 3. Clock modes 



External Can Oscillator, Divide-by-1 

This option is preferred when a 40-MHz clock is undesirable because of EMI ef- 
fects, cost, etc. You can provide a 20-MHz clock on the input, select the divide- 
by-1 option, and the internal machine rate will be 20 MHz or 50 ns. Internally, 
the 20-MHz input is actually multiplied by two, then divided by two to create a 
divide-by-1 result. The PLL must be enabled in this option to accomplish the 
multiply-by-2 internally. The can oscillator output is hooked directly to CLKIN2 
and +5 V is hooked to X2/CLKIN. 




Figure 4. Hardware hookup 



CLKMD1 


CJLKMD2 


Comments 


1 





;PLL is enabled (required for X2 of input), 






internal oscillator is disabled (not needed), 






lower EMI, X2/CLKIN must be connected 






to +5 V. 



Figure 5. Clock modes 



External Crystal, Divide-by-2 

This is the only option available for crystal users. The internal oscillator must be 
enabled to convert the output of the crystal to a square wave. The outputs of the 
crystal are hooked to the XI and X2/CLKIN pins. 




Figure 6. Hardware hookup 



CI.KMD1 


CI, KM 02 


Comments 


1 


1 


jinternal oscillator is enabled to convert 






crystal output to a square wave, PLL is 






disabled (not needed). 



Figure 7. Clock modes 



Clock Summary 

Given below is a summary of all the options. 
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TMS320C5x DSK Analog I/O 

Contributed by Gerald Capwell 



Design Problem Sometimes the TMS320C5x DSK does not respond to a microphone input. Does 
the DSK require a pre-amp on the input and an output driver? 

The 'C5x DSK is designed to directly connect to a microphone and an 8 Q speaker 
via RCA jacks. The I/O RCA jacks are directly connected to the dual A/D and D/A 
converter called the Analog Interface Circuit (AIC). In some instances, the micro- 
phone does not generate the signal level required for the AIC to detect and convert 
the signal. It is suggested that a dynamic or amplified (pre-amp) microphone be used 
to ensure the appropriate signal level is achieved. The output signal from the AIC 
can be heard if the connected speaker is small, low power, and easy to drive. 

An inexpensive option is to build a pre-amp for the input and a speaker driver for 
the output using a few operational amplifiers and other components. These are ex- 
plained below: 

Input Pre-Amp 

The AIC provides the user with two pairs of inputs called IN+, IN- and AUXIN+, 
AUXIN- and one pair of outputs called OUT+ OUT-. The AIC also has the option 
to run in single-ended or differential input modes. The DSK always runs in single- 
ended mode using the IN+ input, therefore the IN- pin is grounded. The design 
below takes the signal from the RCA jack (connected to IN+), amplifies the signal, 
and outputs it to the AUXIN+ pin. Hence, an unamplified (IN+) signal and an 
amplified (AUXIN+) signal are both available to the AIC by using the same RCA 
input jack. 

The main concern when designing a pre-amp for the 'C5x DSK is not to exceed 
the maximum input voltage to the AIC. Referencing the AIC data sheet in the 'C5x 
DSK User's Guide page B-18, the IN+ and AUXIN+ maximum voltage running in 
single-ended mode is Vjj + 0.3 V. In this case, the supply voltage, Vjj, is 5.0 V, 
making the op-amps drive approximately 4 5 V maximum (rail-to-rail). Typically a 
dynamic microphone generates a range of 1 0-30 mV, therefore you can amplify the 
input by 150 and still remain below the op-amp's saturation point and the AIC 
maximum rating. The circuit shown in Figure 1 is a two-stage amplifier with a vari- 
able gain of 100 to 150. The first stage gain A v = 10 is fixed and cascaded with the 
second stage which has a variable gain A v = 10 to 15. 
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Figure 1. Pre-amp schematic 



The components boxed-in with the dashed lines may be required in order to 
minimize the effects of the DC offsets associated with the operational amplifiers, 
especially when cascading them together. Since the circuit multiplies the input 
by only 150, the effect of the DC offset should be minimal. The offset may vary 
depending on which op-amp is used. To minimize offset, the user can null the 
op-amp using the null pins if available, and make sure Ri and R2 are equivalent 
to Rc I I Rf and R2 C I I R2f respectively. 

The input of the pre-amp is connected to the RCA input jack which is actu- 
ally the IN+ pin. Access to the 1N+ pin is possible through connecting to pin 8 
of the JP4 header on the DSK illustrated in Figure 2. The output of the pre-amp 
is connected to the AUXIN + located at pin 6 of the JP4 header. AUXIN- pin 
located at pin 5 of the JP4 header should be grounded. Power (5.0 V) and ground 
are available at the pins illustrated in Figure 2. 







JP4 




JP2 






Q 


20 




□ 20 




OUT- 


* 03 


<0— 


— 0UT+ 


03 40— 


— GNO 


Auxin- 


j— O 5 


60— 


-» Auxint 


V«+ 05 60 




Vce- 




80 


— IN+ 














013 14Q 






6 23 


24Q 









Figure 2. DSK JP2 and JP4 header connections 



Output Driver 

The output driver shown below is a simple buffer or voltage follower. The op- 
amp output impedance is minimal and is capable of driving an 8 CI speaker 
through the decoupling capacitor. 

The input to the driver is connected to the OUT+ pin located at pin 4 of the 
JP4 header. The output of the driver must be connected to an additional RCA 
jack. You CANNOT use the existing output jack, since it is connected to the 
OUT+ pin. 

NOTE: Do not connect the driver output to the existing DSK output jack. 
This may severely damage the DSK. 
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In order for the AIC to select the AUXIN+ amplified signal, you must set bit 4 of 
the AIC Control Register when initializing the AIC (see page B-14 of the 'C5x 
DSK User's Guide). The output driver can drive most 8 £1 desktop speakers. 
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Bootloading a 'C4x Network— Part 1: Direct Connect System 

Contributed by Gerald Capwell 



If the 'C4x devices are directly connected to each other, how can you perform an 
automatic system boot-up at hardware reset? 

In this case the system is called a "Known, Direct Connect" network. In order to 
configure a network in this format, the network must have the following criteria: 

• The system has one dedicated device known as the system "Parent." The parent 
device is configured in hardware via the IIOFx pins to boot from external mem- 
ory (see section 13.2 of the TMS320C4x User's Guide) or from a PC platform. 
The RESETLOC(l.O) pins must be low in order for the on-chip ROM bootloader 
to load programs from external memory. 

• All devices (except the system parent) are called system "Children." The system 
child is dependent upon the parent for bootloading. The system children are con- 
figured in hardware via the IIOFx pins to boot from commports (see section 13.2 
of the TMS320C4x User's Guide). 

• The system parent knows where (which commports) the children are connected. 

• Each parent and child connection is direct. There are no intermediate devices 
involved in order for the parent to bootload the child. 




Figure 1. Known, direct connect network 
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Figure 1 shows the system parent directly connected to the children. At a sys- 
tem reset the parent bootloads from external memory (PROM) and begins per- 
forming its tasks as the system parent. Each child polls its commports (function of 
the on-chip ROM bootloader) to see if the parent is attempting a boot. The par- 
ent knows where (which commport) the child is connected. The example C code 
below demonstrates the process the parent follows in order to bootload the net- 
work. Pointers are initialized at the beginning and end of the boot table. The 
pointer at the beginning of the boot table is incremented after each word is trans- 
mitted to the child (out_word command) and continues until the end of the boot 
table is reached. At this point the child is booted and the parent repeats for all 
system children. 



int port_addr = 100040h; 

extern int beg_label, end_label; 

int port [6] = U, 2, -1, 3, -1, -1); 

volatile int *table_ptr, *end_ptr; 

end_ptr = &end_label; 

main 

{ 



for 
{ 



(CPX = 0; CPX 6; CPX++) 

if (port [CPX] = 0) 

table_ptr - &beg_label; 
port_addr = port_addr + 



/* init port pointer to commport 

/* init external labels of the boot table 

/* init CP connection matrix, -l's indicate no connect 

/* init the boot table pointer and end table pointer 

/* set end pointer to the end of boot table 



/* Continue for all six commports 

/* If child is connected then boot {!= -1) 



/* Set table ptr to the begining of boot table 
(10h*CPX) ; /* Point to next commport address 



do 
{ 

*(port_addr + 2) = *table_ptr; 
table__ptr++; 
}while (table_ptr end_ptr) ; 

) 

CPX++; 

} 



/* Copy data at table_ptr to cp out mem. loc 
/* Increment ptr to next word of boot table 
/* Continue until complete boot table is 
/* transmitted through CPX. 

/* : 



Figure 2. System parent 



When the first data word is sent from the parent, the child locks onto the 
commport, stores the commport address to register AR3, and continues to receive 
data until the termination word is received. If the child does not receive the com- 
plete boot table in the correct format, the child will never completely boot-up. 
Please refer to section 13.2 of the TMS320C4x User's Guide to learn more about 
the boot table requirements. 

The parent code in Figure 2 uses external labels to reference the boot table. 
The external labels are resolved when the boot table is linked to the parent's ap- 
plication code. The code which resides in the parent is shown in Figure 4. 

To create the boot table and link it to the parent's application, there are sev- 
eral steps which must be taken to convert the child's application code. The steps 
are listed below: 



• The child application should remain self contained, assembled, and linked as 
an independent application (child, out) . 

• Use the Hex30 Conversion Utility to convert the . out file to a PROM pro- 
gramming file in boot table format. Please read the Hex30 Utility Addendum 
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(Lit* SPRU081 ) to learn more about creating a boot table (-boot command) using 
Hex30. An example Hex30 command file is shown in Figure 3: 



child. out 




/* 


Specify input COFF file 


*/ 


-o child 


hex 


/* 


Specify output filename 


*/ 


-a 




/* 


Convert file to Hex30 ASCII format 


*/ 


-memwidth 


32 


/* 


Word length of device (32 bits wide) 


*/ 


-romwidth 32 


/* 


Specify 32 to convert the complete word 


*/ 


-boot 




t* 


Hex30 to construct boot table header 


*/ 



Figure 3. Hex30 command file 



• Now use the HEX2ASM conversion utility (executables and directions available 
on TMS320 BBS in archived file called hex2asm.exe) to convert the Hex30 pro- 
grammer file (child. hex) to an assembly file ( child. asm) . The 
HEX2 ASM utility extracts the valuable data and creates . word xxxxxh for each 
32-bit word and saves it in a . sect table. The HEX2ASM converter can also 
include global labels. In Figure 5, the global labels which the parent references 
are beg_label and end_label. 



Note: Also please note that labels can be defined in the final linker . cmd file 
when linking the child boot table to the parent application code. This is accom- 
plished by including the following in the linker . cmd file: 



SECTIONS 




{ 




.child: {_beg_label = . 




* ( . child) 




_end_label = . 


-1; } RAM 1 



Figure 4. Defining labels in the linker command file 



• The output of the HEX2 ASM converter is an assembly file. Link the child . obj 
file with the parent's parent . obj file. 



Parent 
Application 



Child 
Boot Table 



Figure 5. Parent's memory map 



_Beg_label : 



_End_label : 
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At this point the parent application is linked to the child application (in boot 
table format) as illustrated in Figure 5. At reset, the parent device must bootload 
this information from external memory or a PC platform. If the parent boots from 
external memory the final (parent+child) code must be converted again by the 
Hex30 utility to create the PROM programmer file. If booting the parent via a 
PC platform, the data must be converted again in order for the PC to read and 
transmit the boot information properly. 
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Emulator Processor Access Timeout 

Contributed by Rosemarie Piedra 



What would cause a "processor access timeout" message in the XDS510 emulator 
command window? Is the address provided in the error message meaningful? 



Solution There are two basic causes: 
1 ) The device is in reset. 

Attempts to execute (single-step or run) a program when the DSP is in reset may 
cause the "processor access timeout" message to be displayed in the COMMAND 
window. 

The device will always "time out" when using "run" or "runf." When single-step- 
ping, the situation could be dependent on the DSP silicon version you are using. For 
example, in the case of the 'C4x PG2.x silicon, you may be able to step without the 
error message. That is not the case in PGl.x or PG3.x or above. 

2) A current access that is being truncated. 

The debugger will break any pending CPU/DMA access that is not completed 
within a timeout period (one second) during single-stepping or after an emulator 
halt. This could happen in the following situations: 

A) Access to a device peripheral that is not ready. 

For example, in the 'C4x, when DMA/CPU reads from an empty IFIFO, or when 
the DMA/CPU writes to a full OFIFO. 



Solution: Check the level of the comm ports before accessing it. In the DMA case, 
you should use DMA synchronization. 

Data in the comm port's input FIFOs can only be read once. The debugger dis- 
plays memory values by reading them from the 'C4x. It is best, therefore, to avoid 
displaying the input FIFOs in any of the debugger windows as this will cause the data 
to be unavailable to your program. In the example emuinit.cmd file shipped with the 
debugger, the memory map commands that define the comm ports' FIFOs are com- 
mented-out to avoid this problem, but still show how they might be defined if neces- 
sary. 
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B) During execution of a large "Repeat-Single" instruction. 

The "repeat-single" instruction and the instruction that is being repeated is 
considered one single instruction from the emulator's point of view. The emula- 
tor can only single-step between instruction fetches. In the case of the RPTS 
('C4x/'C3x repeat-single), the instruction that is being repeated is fetched only 
once. 

C) During interlocked instructions or during any instruction that access mem- 
ory is not ready. 

WARNING: Different debugger versions may present a different behavior: 

For example, in the 'C4x XDS510, debugger versions 2.20 or higher will send 
a "processor access timeout addr=xxxx" message if a read or write access doesn't 
complete. However, versions 2.01 and lower may not send any warning message if 
a read access is broken. This "incomplete read" can be misinterpreted as an access 
completion. 

The addr=xxxx provided in the error message will approximately correspond 
to the address where the timeout occurs when this is the result of a "debugger" 
access timeout, for example, when displaying memory that is not ready. In the 
case of a CPU/DMA timeout, what you probably will receive is an address 
= OxdeadcOde that is not meaningful. In this case, the previous 3-4 instructions 
(in the case of a DSP with a 4-deep pipeline) to the PC value should give you an 
indication to where the timeout occurs. 

Sometimes a different message could be caused by the same problem. An "invalid 
operand" message coming from the debugger expression analyzer could be an 
indication that a timeout happens. For example, when you type " ? *0xxxxxx" in 
the command window and you receive the "invalid operand" message, this may 
imply that the debugger is timed out when accessing that memory location. 
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Extending Fixed-Point Dynamic Ranges 

Contributed by Alex Tessarob 



Design Problem 



How can you extend the fixed-point math dynamic range beyond the range of a Q15 
number with a minimum of instructions? 

In many advanced control problems such as state estimators, Kalman filters and 
some high Q filters, the dynamic range/accuracy of the coefficient can sometimes be 
beyond the range of a Q 15 number while the data value can be typically represented 
as a Q15 number. 

Aside from trying to dynamically scale the coefficients to extract as much accu- 
racy as possible or trying to use floating point math, there is a technique that can per- 
form 32-bit X 16-bit math at an effective 4 cycles per Tap and potentially 2 cycles 
per Tap for larger then 6th order systems ( + some fixed overhead of about 8-13 cycles) . 

The trick is to re-scale the numbers and represent the problem as an integer 
value + a fractional value. For example: 

Y = 2.391456*X0 - 0.0235045*X1 + O.0OO329758*X2 - 34.3392345*X3 

In the above equation, the filter Coefficients have a dynamic range exceeding a 
16-bit Q15 number. If we re-scale the problem as follows: 

Y = [1224.425472*X0 - 12.034304*X1 + 0.168836096*X2 - 17581.68806*X3]/512 
And then allocate the following coefficient values: 

Y = [(A0i+A0f)*X0 + (Ali+Alf)*Xl + (A2i+A2f)*X2 + (A3i+A3f)*X3]/512 



where: 



AOi = 1224 = 04C8h 

AOf = 0.425472 = 3676h (= 0.425476074) 
Ali = -12 = FFF4h 

Alf = -0.034304 = FB9Ch (= -0.034301758) 
A2i = = OOOOh 

A2f = 0.168836096 = 159Ch (= 0.168823242) 

A3i = -17581 = BB53h 

A3f = -0.68806 = A7EEh (= -0.688049316) 
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The problem then reduces to calculating the following: 

Y = (AOi*X0 + Ali*Xl + A2i*X2 + A3i*X3) + (A0f*X0 + Alf*Xl + A2f*X2 
+ A3f*X3) 

This is like calculating two filter banks. The above problem is coded in the ex- 
ample below: 



Assume: X0,X1,X2,X3 = Q15 (-1 range 0.999053955) 

Y = Q10 (-32 range +31.99902344) 
Ymin-max = 2.391456 + 0.0235045 + 0.000329758 + 34.3392345 
= +/- 36.75452476 





Sat 


= 06000h 




Round = 08000h 


SETC 


OVM , 


Enable saturation. 


SETC 


SXM 


Enable sign extension. 


SPM 


3 


Set shift mode = -6 


LT 


AOf 




MPY 


XO 


P = A0f*X0 


LTP 


Alf 


ACC = A0f*X0 


MPY 


XI 


P = Alf*Xl 


LTA 


A2f J 


ACC = ACC + A1F*X1 


MPY 


X2 


P = A2f*X2 


LTA 


A3f 


ACC = ACC + A2f*X2 


MPY 


X3 


P = A3f*X3 


LTA 


AOi 


ACC = ACC + A3f*X3 


SPM 







SACH 


Temp, 6 


On C5X replace by BSAR 9 


LAC 


Temp, 1 


ACC = ACC/512 


; instruction. 




MPY 


XO 


P = A0i*X0 


LTA 


Ali 


ACC = ACC + A0i*X0 


MPY 


XI 


P = Ali*Xl 


LTA 


A2i 


ACC = ACC + Ali*Xl 


MPY 


X2 


P = A2i*X2 


LTA 


A3i 


ACC = ACC + A2i*X2 


MPY 


X3 


P = A3i*X3 


APAC 




ACC = ACC + A3i*X3 


ADDS 


Round 


Round result. 


ADDH 


Sat 


Saturate Y to Q10 value 


SUBH 


Sat 




sum 


Sat 




ADDH 


Sat 




SACH 


Y,l 


Y = Q10 number. 



; Cycles = 13 + 4n cycles (n = number of taps) . 

; Note: If saturation is not required, Cycles = 8 + 4n cycles 



Figure 1. 



If the number of taps is greater then 6, then a RPT loop can be used for each 
bank and the effective cycles/tap can be approximately 2. 

The above technique is almost equivalent to a floating-point notation with a 
4-bit exponent and a 16-bit mantissa. 
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Accessing TMS320C5x Memory-Mapped Registers in C- 
C5XREGS.H 

Contributed by Leor Brenman 



Design Problem How do I access the TMS320C5x memory-mapped registers in C? 

Solution Accessing most of the 'C5x registers from C is easily accomplished using pointers 

since most of the registers are memory mapped. The most common reason for access- 
ing memory-mapped registers is to control the 'C5x peripherals. Refer to the 'C5x 
Users Guide for a list of the memory-mapped registers and their associated addresses. 
As an example, the 'C5x serial-port control register, SPC, memory mapped at ad- 
dress 0x0022, could be declared in C as follows: 

volatile unsigned int *spcr = (volatile unsigned int *) 0x0022; 

Note the volatile modifier since this register changes independent of program 
control. The register can be written to, and read from, as follows: 



*spcr = 0xc8; 
currentXRDYValue 



*spcr & 0x800; 



f* Load SPC with 0xc8 */ 
/* Check XRDY bit of SPC */ 



However, this does not lead to the most readable code. By using bit-field data 
structures to describe the bit fields of the register, more readable code can be devel- 
oped. For example, consider the following data structure for the serial-port control 
register. 

typedef union 
{ 

unsigned int intval; 

struct 

{ 



unsigned int r_0 


1; 


/* 


Reserved */ 


unsigned int dlb 


1; 


/* 


Dig Loopback Mode */ 


unsigned int fo 


1; 


/* 


Format */ 


unsigned int fsm 


1; 


/* 


Frame Synch Mode */ 


unsigned int mem 


1; 


/* 


Clock Mode */ 


unsigned int txm 


1; 


/* 


Transmit Mode */ 
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unsigned 
unsigned 
unsigned 
unsigned 
unsigned 
unsigned 
unsigned 
unsigned 
unsigned 
unsigned 
} bitval; 



int xrst 
int rrst 
int inO 
int inl 
int rrdy 
int xrdy 
int xsrempty 
int rsrfull 
int soft 
int free 



:1; 
ill 
ill 
:lj 
ill 
:t; 
ill 
:1; 
il; 
:1s 



/* Transmit Reset */ 

/* Receive Reset */ 

/* Input */ 

/* Input 1 */ 

/* Receive Ready */ 

/* Transmit Ready */ 

/* Xmt Shift Reg Emty */ 

/* Rec Shift Reg Pull »/ 

/* Soft */ 

/* Free run */ 



The bit XRDY can now be read as follows: 

volatile SPC_REG *spcPtr = (volatile SPC_REG *) 0x0022; 
currentXRDYValue = spcPtr->bitval .xrdy; 



The TMS320 BBS contains the self-extracting file, C5XREGS.EXE. This file 
contains a C header file, C5XREGS.H, that can be included (ie, #included) in 
your C programs to assist in accessing 'C5x peripheral registers as well as all of the 
'C5x memory-mapped registers. Where appropriate, bit-field data structures are 
also defined. The remainder of this document will describe its usage. 

To use C5XREGS.H, simply include the file in your C program. Each memory- 
mapped register has two entities associated with it: ( 1 ) a macro that defines its 
address and (2) a type definition that describes the bit fields and the memory- 
mapped register. The macros for the address have two components for each regis- 
ter: one for the actual address and one to typecast the address as a pointer to a 
data structure that defines the memory-mapped register. The following code seg- 
ment describes the address macros for the serial-port control register: 

#define SPC_BASE 0X0022 

♦define SPC_ADDR ((volatile SPCJREG*) ((char*) SPC_BASE) ) 

Two different methods have been used to type define the registers. For registers 
with bit fields, such as the SPC and interrupt mask register, IMR, data structures 
have been created that comprise a union of a 16-bit integer component, named 
intval, and a bit-field component, named bitval. This also includes registers that 
have some reserved component, such as the 5-bit TREG 1 register. The bit-field 
data structure for the serial-port control register given above is such an example. 
Registers that have no bit field definition, such as the serial-port receive register, 
DRR, are defined as either signed or unsigned integers. 

To access registers defined as bit-field data structures, use the following syntax: 

/* Set FSM.XRST and RRST bits of the SPC */ 
SPC_ADDR->intval = 0xc8; 

To increase the readability of such assignments, macros for setting the bits 
have also been defined. The following example illustrates the use of these macros 
to accomplish the same thing: 



/* Set FSM.XRST and RRST bits of the SPC */ 
SPC_ADDR->intval = FSM | XRST | RRST; 



The previous examples set the serial port for Frame Sync Mode and resets the 
transmit and receive sides of the serial port. Additional macros have been defined 
such that the user only need to type SPC instead of SPC_ADDR->intval. Therefore 
the last example could be expressed as follows: 

SPC = FSM | XRST | RRST; 

Alternatively, the bit fields could have been used as follows to accomplish the 
same task: 



SPC_ A DDR->bitval.f S m= 1; 
SPC_ADDR->bitval.xrst = 1; 
SPC_ADDR->bitval.rrst = 1; 



To access registers that are not defined as bit-field data structures, use the follow- 
ing syntax: 

*DXR = outputValue; 

The previous example writes outputValue to the serial-port transmit register. 

To declare a pointer to the serial-port control register, use the following syntax: 

volatile SPC_REG *spcr = SPC_ADDR; 

The register is accessed as follows: 

spcr->intval = 0xc8; 
fsmbit = spcr->bitval . f sm; 
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C Routines for Setting Up the AIC on the TMS320C5x EVM 



Contributed by hear Brenman 



Design Problem How do I control the AIC on the TMS320C5x EVM in C? 

Solution Programming the 'C5x to communicate with the AIC on the EVM involves ( 1 ) set- 
ting up the 'C5x serial port and (2) resetting and (3) configuring the AIC. Since the 
'C5x serial-port registers (SPC, DXR, and DRR) are memory mapped, setting up the 
'C5x serial port and reading and writing to the serial-port-receive and transmit regis- 
ters is easily accomplished in C. For example, the 'C5x serial-port-control register, 
memory mapped at address 0x0022, could be declared in C as follows: 

volatile unsigned int *SPC = (volatile unsigned int *) 0x0022; 

and accessed as follows: 

SPCVALUE = *SPC; 
*SPC = 0x00c8; 

Resetting the AIC is accomplished by reading from, and writing to a memory- 
mapped I/O port connected to a target control register, which is also easily accom- 
plished in C. Finally, configuring the AIC is achieved by writing values to the AIC 
through the 'C5x serial port. The TMS320 BBS self-extracting file, EVMAIC5X.EXE, 
contains code necessary to create a library, EVMAIC5X.LIB, that contains the fol- 
lowing two functions: 

void initAic (AIC_CONFIGURATION *aicParams) ; 

void getDefaultAicConfig(AIC_CONFIGURATION *aicParams) ; 

The functions are defined in the file EVMAIC5X.C and are prototyped in the 
header file EVMAIC5X.H. This file should be included (ie, #include) in user pro- 
grams that link to the EVMAIC5X.LIB library. Also included in the file, 
EVMAIC5X.EXE, is demo code that illustrates how to build an application that uses 
the library functions. 

The code is written entirely in C and provides a starting point for users writing C 
code to interface 'C5x processors to TLC3204x AICs on any hardware platform. 
The code demonstrates the following: 
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1. Communicating to the AIC in primary and secondary modes in C. 

2. Using bit fields in C. 

3. Accessing memory-mapped DSP registers and I/O peripherals in C. 

4. Using the C language extension asm statement. 

Usage 

The user should call the function initAic ( ) after all other processor and vari- 
able initialization is complete. The function globally enables interrupts and 
enables serial-port-receive interrupts only. After the function is called, serial-port- 
receive interrupts must be ready to be serviced. That is, a serial-port -receive inter- 
rupt service routine (1SR) vector must be installed and a serial-port-receive ISR 
must be defined. The default AIC configuration for this library is for synchronous 
operation. Other AIC default parameters can be found in the function 
getDef aultAicConf ig ( ) . To use the default configuration, call the function 
initAic ( ) with NULL as a parameter, as follows: 

initAic (NULL) ; 

If the AIC is to be operated in asynchronous mode, the user's code must 
enable serial-port-transmit interrupts after calling initAic ( ) and be prepared 
to service serial-port-transmit interrupts as well as receive interrupts. In addition, 
to modify any other default AIC configuration setup by the initAic ( ) func- 
tion, simply pass the AIC configuration parameters to the function initAic ( ) 
in a data structure of type AIC_CONFIGURATION, defined as follows: 

typedef struct 

{ 

AIC_COMMAND_0 commando ; 
AIC_COMMAND_l commandl; 
AIC_C0MMAND_2 command2 ; 
AIC_COMMAND_3 commancU ; 
} AIC_CONFIGURATION; 

This structure definition as well as those for AIC_COMMAND_0 through 
AIC_C0MMAND_3 are defined in the header file EVMAIC5X.H. As an example, 
the bit-field structure definition for AIC_C0MMAND_3 is shown below: 

typedef struct 

{ 

unsigned int command :2; 

unsigned int highpass si; 

unsigned int loopback s,l; 

unsigned int aux si; 

unsigned int sync si; 

unsigned int gain :2; 

unsigned int d_8 si; 

unsigned int sinx si; 

unsigned int dlOout si; 

unsigned int dllout si; 

unsigned int d_cdef :4; 
} AIC_COMMAND_3 ; 

The function getDef aultAicConf ig ( ) returns the default configuration. 
If only a few configuration parameters need to be altered, the user can call this 
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function and then change the parameters as needed. The following example illus- 
trates how to do this: 

1. Create a variable of type AIC_CONFIGURATION: 
AIC_CONFIGURATION aicParams ; 

2. Pass the address of this variable to the function getDef aultAicConf ig ( ) : 
getDefaultAicConfig(&aicParams) 

3. Modify the parameters as necessary: 
aicParams . command3 . loopback = 1 ; 

4. Call the function initAic ( ) with the address of the aic paramater variable: 
initAicf&aicParams) ; 

One additional data structure, AIC_PRIMARY, has been defined in 
EVMAIC5X.H It is useful for handling the primary communications data (ie, the 
speech or audio) from the AIC in your interrupt service routine. By using this data 
structure efficiently, automatic data shifting is accomplished as required by the trans- 
mit and receive data format of the AIC. It is defined as follows: 

typedef union 

{ 

unsigned int _intval; 

struct 

{ 

unsigned int command :2; /* Must be initialized to */ 
signed int data :14; 
} _bitval; 
} AIC_PRIMARY ; 

Assume a global variable called aicPrimary of type AIC_PRIMARY has been 
declared; data is read from the 'C5x serial-port receive register, DXR, as follows: 

#define DXR ((volatile unsigned int *) 0x0021) 

aicPrimary. _bitval. command = 0; /* Initialize lower 

2 bits to */ 

aicPrimary. _intval = *DXR; /* Read all 16 bits */ 

dataReceived = aicPrimary ._bitval .data; /* Useful data is the 

upper 14 bits */ 

Similarly, the following technique can be used to output data: 

#define DRR ((volatile unsigned int *) 0x0020) 

aicPrimary. _bitval. data = xmitData; /* Write data to upper 

14 bits */ 

*DXR = aicPrimary. _intval; /* Transmit all 16 bits */ 

In the previous two code segments, notice the alternative method used to access 
DXR and DRR, the serial-port-transmit and receive registers in C. 



The following code fragment illustrates the necessary components of a com- 
plete application to use this library: 

♦include "evmaic5x.h" 

AIC_PRIMARY aicPrimary; 

void main (void) 
{ 

• 

aicPrimary. _bitval. command = 0; 
aicPrimary. _bitval. data = 0; 

initAic (NULL) ; 
for(;;) {} 

} 

void c_int5 (void) 
{ 

♦define DRR 
♦define DXR 

signed int recData; 
signed int xmtData; 



aicPrimary. _intval = *DXR; 
recData = aicPrimary ._bitval. data; 

/* Do processing from recData to xmtData */ 

aicPrimary. _bitval. data = xmtData; 
*DXR = aicPrimary. _intval; 

) 



/* Initialize AIC just before */ 
/* entering the processing loop */ 



/* Serial port receive ISR 
(must be installed) */ 

((volatile unsigned int *) 0x0020) 
((volatile unsigned int *) 0x0021) 
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How Can Comb Filters be Used to Synthesize 
Musical Instruments on a TMS320 DSP? 



Design Problem Music synthesis is the ability to create musical scores by synthesizing different musi- 



cal instruments. Different methods of music synthesis include sampled sound synthe- 
sis (wavetable synthesis), FM synthesis, and instrument modeling. Sampled sound 
synthesis inherently requires significant amounts of memory to store instrument sam- 
ples but results in extremely natural-sounding music. FM synthesizers are algorithmic 
and typically require little memory but result in unnatural-sounding music. Instru- 
ment models, based on analysis of the instrument being synthesized, often yield effi- 
cient implementations producing highly-natural sounds. 

This document presents a DSP implementation of a model for synthesizing 
plucked strings, based on the Karplus Strong Plucked-String Synthesizer. 



Solution String-Synthesis Model 

The Karplus Strong string-synthesizer model produces extremely natural sounding 
plucked strings. The model is based on a IIR comb filter, shown in Figure 1. 



Contributed by Leor Brenman 




Z 



,-N 




f 



Figure 1. Karplus Strong string-synthesizer model 
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The input, x(n), is a burst of Guassian White Noise lasting N samples and is 
zero elsewhere. The pitch period of the synthesized plucked string is N times the 
sample period. The output, y(n), is the synthesized plucked string sound and is 
valid after the Nth input sample. The multipliers, f, must be less than or equal to 
0.5 for IIR stability, and determine the sustain of the synthesized string sound. A 
value of 0.5 represents the longest sustain. The timbre of the sound is similar to 
that of a plucked steel-string guitar. 

An alternate view of the model is to load the tap delay line with white noise 
and let the filter ring as long as desired with no input[x(n) = 0]. In this case, the 
output is valid as soon as the filter is started. 

The synthesized string sound has a frequency of fs/N, where fs is the system 
sampling frequency. This inverse dependence on N results in poor granularity for 
small N. That is, a unit change in N when N is small results in a large change in 
the frequency of the synthesized string. 

This model, while simple, is suprisingly realistic; the burst of noise represents 
the plucking of the string and the comb filter, which acts as a resonator, repre- 
sents the resonating body of an string instrument such as an acoustic guitar or 
bass. This model is well suited for implementation on DSPs which are designed 
for implementing digital filters. 

DSP Implementation 

This type of filter is easily implemented on Texas Instruments DSPs. Many TI 
DSPs support circular buffering which facilitates implementing the tap-delay line 
of the comb filter efficiently. Also, they provide the necessary numerical process- 
ing speed to support 44. 1 -kHz sampling-rate processing required for CD-quality 
audio. On-chip peripheral support for analog-to-digital and digital-to-analog 
converters reduces chip count and system cost. 

For example, the TMS320C31 DSP, with a 33-75 ns instruction cycle time, 
circular buffer support, and on-chip serial port, is well-suited for implementing 
these types of algorithms. The following code segment implements one iteration 
(sample) of the string synthesizer model on the 'C3 1. 



;AR0 points to tap delay sanple y[n-N] 


; EEC = pitch period value, N+l 






;R1 = sustain (0. 5=long, . 48=med, . 45=short) 


LDF *AR0++(1)%,R0 




Load y[n-N] 






ARO points to [n-(N-l)] 


ADDF *AR0-(2)%,R0 




Add ytn-(N-l)] 






ARO points to y[n] 


MPYF R1,R0 




f*[y(n-N) + y(n-(N+l) ) ] 


STF R0,*AR0++<2)% 




Store R0 = y(n) 






ARO points to y[n-(N-l)] 






« (next) y[n-N] 


* R0 contains the return value. 


y(n) 



Figure 2. 



When a new plucked string is desired, a circular buffer of the appropriate size 
is setup and filled with white noise. The appropriate parameters, such as the pitch 
period and sustain factor, are set and remain fixed between calls to the string- 
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synthesizer routine. Samples should be produced at 44-1 kHz for CD-quality music 
synthesis. 

As shown in the preceding code segment, the core of the algorithm can be exe- 
cuted in four 'C31 insruction cycles per output sample. The overhead that is re- 
quired is for setting up the parameters and loading of the buffer with white noise 
upon creation of a new string and the interrupt service routine associated with writ- 
ing the output to the 'C31 serial port for delivery to the D/A converter for listening. 

A C-callable version of the string synthesizer function is shown in Figure 3. A 
C-calling shell is shown in Figure 4. 

The TMS320DSP BBS file KPSTRONG.EXE contains the necessary files to cre- 
ate a library of C-callable functions for implementing a string synthesizer using the 
algorithm described in this paper. It is written primarily in C and C-callable assembly 
language. A TMS320C30 EVM demo is also included. This code serves as an exam- 
ple of an instrument modeling music synthesis algorithm as well as an example of 
implementing circular addressing in C and C-callable assembly language. 



* Strfunc.asm - Karplus Strong Base Model Plucked String Synthesizer 

* implementation in TMS320C3x/ 'C4x assembly language. 
* 


* KPSTRING data structure: 


*typedef struct kpString { 

* DTYPE *tapDelay; 

* OTYPE sustainFactor; 

* int pitchPeriod; 

* OTYPE *tapDelayBase; 

* int currLoc ; 

* OTYPE *noise; 
*} KPSTRING; 


/* Tap delay buffer data pointer */ 

/* Sustain factor, f 0.5 */ 

/* Pitch period in samples = fs/fdesired */ 

/* Tap delay buffer base pointer */ 

/* Index points to delay[N+l] */ 

/* Noise buffer used to initialize string */ 


* Function prototype: 
* 


* float string (KPSTRING *st, float 
* 


*out, int nsamples); 


* 

FP . set AR3 

.global _string 

_string 
* 




* Stack manipulation 
* 




PUSH FP 

LDI SP,FP 

LDI *-FP(2),AR2 

LDI *-FP(3),ARl 

LDI *-FP(4),RC 


AR2 points to a KPSTRING data structure 
AR1 points to output [0] 
RC = nsamples 


* Setup for circular buffer fetches and loop 
* 


LDI *+AR2 (0) ,AR0 
LDF *+AR2(l),Rl 
LDI *+AR2(2),BK 


ARO points to y[n-N] 

Rl = decay (0 . 5=long, . 48=med, . 45=short) 
BK = pitch = N = BufferSize-2 


ADDI 2 , BK 
SUBI 1,RC 

* 


BK = BufferSize 

RC = 1 less than # iterations for RPT 



Figure 3. C-callable version of string synthesizer function 



Implement Comb buffer 



RPTB 


STLOOP 


loop over n 


LDF 


*AR0++(1)%,R0 


RO = y(n-N) 






ARO points to y[n-(N-l)] 


ADDF 


*ARO-(2)%,R0 


RO = y(n-N) + y[n-(N-l)] 






ARO points y(n) 


MPYF 


R1,R0 


RO = f*[y(n-N) + y(n-(N-l))] 


STF 


R0,*AR0++(2)% 


RO = y(n) , store result in delay line 



ARO points to y[n-(N-l)] 
(next) y(n-N) 

* Uncomment out these two lines if this routine should ADD it's calculated 

* output to the output stream, otherwise the output is overwritten 

* LDF *AR1,R2 ;Get output buffer data value 

* ADDF R2,R0 ;Add calculated value 

STLOOP: 

STF R0,*AR1++(1) ; Store y(n) to output buffer 



* Restore circular buffer pointer 
* 

*** STI AR0,*+AR2(0) ,-restore circular buffer pointer, done in 

; delayed branch below 

* 

* Return 
* 

LDI *-FP(l),Rl ;load return address 

BD Rl ; branch back 

LDI *FP,FP ,-restore Frame Pointer 

*** NOP 

STI AR0,*+AR2(0) ,-restore circular buffer pointer 

SUBI 2,SP ,- Restore Stack Pointer 

*** B Rl ; Branch occurs here 



Figure 3. continued 



— 



#include <kpstring.h> 



/***************************************************************************************/ 

/* MAINO */ 
/******************************************************************* 

void main (void) 

{ 

float *noise; 
KPSTRDJG cs ; 



makeNoisef&noise, 100, 2) ; 
createString(&cs, 50, noise) ; 
cs . sustainFactor = 0.49; 



for(;;) 
{ 



if(NoteOn) initString(&cs, currPitchPer) ; 
string (&cs, output, blocksize) ; 

for (i=0;i<blockSize;i++) output[i] = amplitude*output [i] ; 



) 

} 



Figure 4. C-calling shell 
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TMS320C5x DSK Board 



Contributed by Gerald CapuieR 



Design Problem In some instances, my DSK applications do not nan properly. At other times the 

application seems to run, but nothing happens. For example, the TRY 1. ASM code 
in the Users Guide does not work. 



Solution You are seeing the effects of an uninitialized DSK board. The tryl example works 
correctly only if you initialize the DSK before entering the tryl routine. In fact, the 
most important task you must perform before starting ANY application is DSK 
initialization. Initialization is done in software preceding the entry to your applica- 
tion and must be done only if the Analog Interface Circuit ( AIC) is to be used. 
Three things must be initialized: 

1 ) The TMS320C50 on-chip timer, 

2) The 'C5x serial port, 

3) The Analog Interface Circuit (AIC). 

The AIC is the device which interfaces the outside analog world to the DSK's 
internal digital world. The AIC is connected to the DSP through the 'C50 serial 
port as shown below in Figure 1. 



TOUT 

FSR 
DR 

TMS320C50 

FSX 
DX 

CLKX 
CLKR 



RESET 
MCLK 



FSR 
DR 



FSX 
DX 



TLC32040AIC 

IN- 
AUXIN+ 
AUXIN- 



SCLK 



0UT+ 
OUT- 



IN+ 



Figure 1. Hardware connection between the V50 and AIC 
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In order to communicate with (initialize) the AIC, the programmer must first 
initialize the DSP on-chip timer to provide the AIC with its master clock and sec- 
ondly, initialize the serial port. Initialization only has to be performed once each 
time the DSK is powered up. This explains why tryl.asm does not work as the 
first application after power up, but does work after you have executed 
FUNC.DSK or some other demo which has the initialization routine included. 

The easiest way to include the initialization routine is to extract the AICINIT 
subroutine from a DSK demo (FUNC.ASM, for example) append it to your appli- 
cation and CALL the subroutine. Be sure to include the constants TA, RA, TB, 
RB, and AIC_CTR. These constants are used to set the AIC sampling frequency, 
filtering, etc. These will be explained later. If you wish to create your own initiali- 
zation routine, continue reading and follow the steps below. 

TMS320C50 On-Chip Timer 

As you can see from Figure 1 , the internal 'C50 timer TOUT is used to supply 
the AIC with its master clock (MCLK). The TOUT pulse is activated each time 
the DSP's period counter decrements to zero. Refer to page 5-45 of the 'C5x 
Users Guide for the formula used to calculate the TOUT rate. The maximum 
TOUT rate is calculated by minimizing the denominator (min=2), therefore mak- 
ing the highest TOUT rate to be Vi the internal 'C50 machine cycle ... or 
10 MHz. Assembly code for generating the maximum TOUT rate is as follows: 

SPLK #01h, PRD ; Load PRD reg. for period of 100 ns -TDDR=0 

SPLK #20h, TCR ; Re-load and begin timer. 

Upon execution of the second SPLK instruction, TOUT will begin generating 
a 10-MHz squarewave. 

Serial-Port Initialization 

The 'C50 serial port must be initialized by modifying the Serial-Port-Control 
register (SPC) of the DSP. The SPC is described on page 5-18 of the 'C5x Users 
Guide. In order to transmit and receive data from the AIC, the serial port must 
be set for Frame Sync Mode (FSM = 1 ) and reset by writing 0s to the XRST and 
RRST bits. At this point, a consecutive write to the SPC bits XRST and RRST 
with all value will bring the serial port out of reset. PLEASE NOTE: A TOTAL 
OF TWO WRITES SHOULD BE MADE TO THE SPC TO RESET OR RE- 
CONFIGURE THE SERIAL PORT. The following code will initialize the serial 
port correctly: 

SPLK #08h, SPC ; FSM=1, XRST and RRST = 00 

SPLK #0C8h, SPC ; FSM=1, XRST and RRST = 11 

After initializing the serial port, it is suggested a dummy word be sent to the 
DXR in order to clear any unwanted data from the serial-port registers. 

AIC Initialization 

Once the AIC is supplied with MCLK and the 'C50 serial port is initialized, a 
reset of the AIC should be performed to force the AIC into a known state. The 
RESET line of the AIC is connected to the BR (Bus Request) pin of the 'C50. 
The BR pin is driven low when external global memory is accessed. Therefore to 



*NOTE: 'C50-40 internal machine cycle is 20 MHz 



reset the AIC, we must define global memory and access it. Refer to page 6-29 of the 
'C5x Users Guide for more information about configuring global memory (GREG 
register). The code below illustrates how to initialize global memory, and assert BR 
to reset the AIC. 



LACC #8 Oh 

SACL GREG 

LAR ARO, #0FFFFh 

RPT #10000 

LACC *, 0, ARO 

SACH GREG 



init 800 Oh-FFFFh as global memory- 
Store to Global Memory Alloc Reg. 
Use ARO to point to location FFFFh 
Access global memory 10,000 times 
to drive pin low for duration. 
Restore GREG to 0000 



As a result of the reset, the AIC is forced into a known stable state. At this point, 
AIC initialization of sample rates, etc. can be performed. Sampling rates are deter- 
mined by the values in the A and B registers of the AIC's transmit and receive sec- 
tions. Tx counter A and Tx counter B determine the D/A conversion timing, 
whereas Rx counter A and Rx counter B determine the A/D conversion timing. 
Page B-l 1 of the 'C5x DSK Users Guide illustrates that for an 8-KHz conversion 
time (transmit), the MCLK is always divided by TA, 2, and then by TB. Therefore 
the conversion timing is equal to 10.000 MHz/(TA x 2 x TB). If TA is 17, then TB 
must be roughly 37. 



10.000 MHz. = a0KHz 



(17x2x37) 



TA is chosen to be 17 so that MCLK/(2 x TA) = 288 K. It will be explained later 
why this is important. The same rules apply to the receive (Rx) section. Please re- 
member the example in Appendix B of the 'C5x DSK Users Guide is based on an 
MCLK of 10.368 MHz. 

In order to initialize the analog chip, the AIC uses primary and secondary commu- 
nications. Please refer to Appendix pages B-14 and B-15 of the 'C5x DSK Users 
Guide. Primary communications are used to load the TA, TB, RA, and RB count- 
ers. Secondary communication is used to load a value to the AIC control register 
and load the TA, TB, RA, and RB registers. The register values are loaded into the 
counters each time the counter decrements to zero (I/O new sample). 



Primary Serial Communication Protocol 
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Txa=TA, Rxa=RA, Txb=TB, Rxb=RB 

Txa=TA+TA', Rxa=RA+RA\ Txb=TB, Rxb=RB 

Txa=TA-TA', Rxa=RA-RA\ Txb=TB, Rxb=RB 

Txa=TA, Rxa=RA, Txb=TB, Rxb=RB 
Initiates secondary comm protocol 



Figure 2. AIC primary and secondary communication protocols 
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Secondary Serial Communication Protocol 
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Unsigned binary 



Control Register Bit Definitions: 

d2 = 0/1 deletes/inserts the bandpass filter 

d3 = 0/1 disables/enables the loopback function 

d4 = 0/1 disables/enables AUX IN+ and AUX IN- pins 

d5 = 0/1 Asynchronous/synchronous transmit and receive sections 

d6 = 0/1 gain control bits 

d7 = 0/1 gain control bits 



Figure 2. continued 



The primary communication is defined by the two LSBs of the data word. For 
example, if the two LSBs are 00, then every time the counters decrement to zero 
(the next sample time) the A registers are loaded into the A counters and the B 
registers are loaded into the B counters. However, if the primary communication 
word LSBs are 01 or 10, then A counters are loaded with A+ A' or A— A' respec- 
tively. Counter B is always loaded with the B register. The TA' and RA' are regis- 
ters which can be used to advance or retard the sampling frequency by shortening 
or lengthening the sample period. This feature can be used to increase the signal- 
to-noise performance and is particularly useful in modem applications. 

Secondary communication is initiated when the primary communication LSBs 
are 1 1. Secondary protocol allows you to load the A, B, and A' registers and en- 
able/disable other internal features of the AIC. As you can see from the above 
figure, the LSBs of the secondary communication word must be 00, 01, or 10 in 
order to load the A, A', and B registers respectively. If the LSBs are 11, then bits 
2-7 are used to initialize/alter the control register. 

The control register provides a way to enable the auxiliary input (AUXIN), 
insert/delete the bandpass filter, change the input gain, and other features. 
Note: The gain is an input gain (pre-amplification) and not an output gain. The 
output gain is always 1 . The input gain can be changed by setting bits 6 and 7 of 
the AIC control register to one of the following configurations: 



Bit 7 


Bit 6 


Gain (preamp) 








1 







2 


ja ; .; : 1 ; ;' ; ; 




4 


I 1 I 1 1 1 



The bandpass filter can be selected or bypassed by setting bit 3 of the control 
word to a one or zero. The frequency response of this filter is shown in Appendix 
B of the 'C5x DSK Users Guide and is based upon a switch-capacitor-frequency 
(SCF) clock of 288 KHz. The SCF is equal to MCLK/(2 x TA) where MCLK is 
fixed at 10 MHz. As a result, it is impossible to achieve an exact SCF=288 KHz. 
Therefore, the frequency response of the filters are scaled by the ratio of the ac- 
tual SCF to 288 KHz. The closest 1:1 ratio between the actual SCF and 288 KHz, 
is when TA=17 (SCF=294 KHz). 
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SCF = 



MCLK 



(2 x Counter A) 



Conversion frequency = 



SCF 



(Counter B) 



Shift clock frequency = 




The shift clock frequency, shown above, is the rate at which the data is shifted 
from the AIC through the 'C50's serial port. This is always 2.5 MHz, since MCLK is 
fixed at 10 MHz. 

Conclusion 

When initializing the hardware, it is safest to create the subroutine and ALWAYS 
call it before trying to use the AIC. After you have created your subroutine, it can be 
cut and pasted to other applications. An alternative is to use the subroutine in- 
eluded with the DSK demo programs. The AICINIT subroutine is executed every 
time in each demo. In fact, the subroutine is written so that easy changes can be 
made to the AIC's Tx, Rx, and control registers simply by changing the values 
located at the top of the file named TA, TB, RA, RB, and AIC_CTR. 

Another Hint: When entering a Interrupt Service Routine (ISR), the DSP's 
interrupts are automatically disabled. Since the debugger uses INT2 to communicate 
with the PC (and vice-versa), be sure to enable INT2 as one of the first tasks in the 



ISR! HAVE FUN! 
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Contributed by Ted Fried, Advanced Computer Communications 



Design Problem How do I develop a full-duplex, asynchronous serial interface using the on-chip 
resources of the TMS320C3x? 

Solution By using the general-purpose I/O pins in conjunction with two timers and an exter- 
nal interrupt, one can develop a very flexible full-duplex UART in software. This 
solution discusses the implementation of an interrupt-driven, 9,600-baud UART 
with 8 data bits, 1 stop bit, and no parity. 

Hardware: 

The hardware interface is relatively straightforward. The receive line is connected to 
both the INTO and IOF1 pins. This will trigger an interrupt on the falling edge of 
the start bit. The transmit line is connected to the IOF0 pin and a pullup resistor. 

Software: 

The receive sequence begins when the start bit triggers the external interrupt. At 
the interrupt service routine, . timerO is then loaded with a value which will result 
in a delay of one-half of the bit time. The routine then loads the timer's interrupt 
vector, enables it, then exits to the main program. When the timer triggers its inter- 
rupt, the main body of the receive code is then run. At this time, the line should be 
in the middle of the start bit. The CPU then samples IOF1 and verifies that the start 
bit has been read in. If not, the routine reenables the external interrupt and exits to 
the main program. If the start bit is verified, the timer is then loaded with the full- 
bit time and started. The procedure then exits to the main program. 

On successive timerO interrupts, the received bits are shifted into a storage 
area in memory until a byte is read in. On the 9th interrupt, if the stop bit is veri- 
fied, the routine will execute a software trap to inform the main program of the byte 
reception. If the stop bit is not verified, the BAD_STOP_BIT subroutine is called 
where the appropriate action is taken. After the received byte is processed, the exter- 
nal interrupt is then reenabled and the system waits for the next start bit. 

The transmit routine begins when the main program loads a byte into the hold- 
ing register and then calls TX_MAIN. This procedure loads timerl with the full-bit 
time value, resets the transmit counter, sets the start bit, and enables the timer's in- 
terrupt. The routine will then exit back to the main program. It is required that the 
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main program not call for another byte transmit until it finds the transmit 
counter equal to 0. On each subsequent timerl interrupt, the routine will shift 
out the transmit byte, including the stop bit, until the transmit counter is zero. 

half_bit_time 
whole_bit_time 
timer_go 
timer_setup 
int_setup 
iof_setup 



.set OlADh i assume 33-MHz TMS320C3X 

.set 035Bh 

.set 03Clh 

.set 0301h 

.set 0301h 

.set 06h 



timerO_vector 


.word 


RX_TMR_INT 


timerl_vector 


.word 


TX_INT 


rx_int_vector 


.word 


RX_INT0 


timerO_period 


.word 


0808028h 


timerl_period 


.word 


0808038h 


timerO_control 


.word 


0808020h 


timerl_control 


.word 


0808030h 


timerO_int_vect 


.word 


0809FC9h 


timer l_int_vect 


.word 


0809FCAh 


intO_vector 


.word 


0809FClh 


rx_byte 


.word 


0809FF8h 


tx_byte 


.word 


0809FF9h 


rx_counter 


.word 


0809FFAh 


tx_counter 


.word 


0809FFBh 



interrupt vector addresses 



! on-chip RAM locations 



i Main setup for asynchronous serial interface to be run at 
powerup. 



SETUP_ASYNCH: PUSH AR7 




OR 


iof_setup, IOF 


iof setup and iof0=l 


LDI 


timer_setup, AR7 


setup timerO and timerl 


STI 


AR7 , etimerO_control 




STI 


AR7 , Stimerl_control 




LDI 


rx_int_vector, AR7 


load into interrupt vector 


STI 


AR7 , @intO_vector 




OR 


int_setup, IE 


enable interrupts 


POP 


AR7 




RETS 







; Start bit received, external interrupt service routine. 



RX_INT0: 


PUSH AR7 




XOR 


Olh, Ie 


disable into 


LDI 


half_bit_time, AR7 




STI 


AR7 , @timerO_period 


rx_timer period 


LDI 


timerO_vector, AR7 




STI 


AR7, @timerO_int_vect 


rx-timer int vector 


LDI 


timer _go, AR7 




STI 


AR7 , @timerO_control 


start rx_timer 


LDI 


OAh, AR7 




STI 


AR7 , @rx_counter 


reset rx_counter 


POP 


AR7 




RETI 
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Timer interrupt service routine for byte reception. 



RX_TMR_INT: 


PUSH AR7 




LDI 


@rx_counter, AR7 




CMPI 


09h, AR7 


; are we at start bit? 


BNE 


STOP 


; nope, check for stop bit 


CMPI 


080h, IOF 


; check rx_bit (IOF1) 


BLT 


OK 


; if less than 80h (IOF1=0)? 


OR 


Olh, IE 


; bad start bit, reenable 



INTO 



STOP: 



ONE: 



BR 

SUBI 

STI 

LDI 

STI 

LDI 

STI 

POP 

RETI 

PUSH 

LDI 

DBNZ 

CMPI 

BLT 

LSH 

STI 

TRAPU 



CMPI 

OR 

BGE 

XOR 

RORC 

STI 

STI 

LDI 

STI 



CLEANUP : 
CLEANUP2 : 



CLEANUP2 
Olh, AR7 

AR7 , @rx_counter 

whole_bit_time, AR7 

AR7 , 8timer0_period 

timer_go, AR7 

AR7 , 9timer0_control 

AR7 

AR6 

@rx_byte, AR6 
AR7, NEXT 
080h, IOF 
BAD_STOP_BIT 

-24, AR6 

AR6, @rx_byte 

B YTE_RECE IVED 

Olh, IE 

CLEANUP 

080h, IOF 

Olh, ST 

ONE 

Olh, ST 
AR6 

AR6, @rx_byte 

AR7 , @rx_counter 

timer go, AR6 

AR6 , @t imer 0_contro 1 

POP AR6 

POP AR7 RETI 



go back to main 
decrement rx_counter 
update counter in memory 

load bit time into rx_timer 

start rx_timer 



if rx_count !=0, get next bit 
check rx_bit (IOF1) 
GO TO INVALID STOP BIT MODULE 
shift rx_byte 24 bits right 
update rx_byte in memory 
TRAP RECEIVED BYTE ! ! 
reenable INT0\ 

check rx_bit. (IOF1) 
force carry flag to 1 
if rx_bit = 1 
set carry flag to 
shift in carry bit 
update rx_byte in memory 
update counter in memory 

start rx_timer 



; Transmit byte main subroutine. 



TX_MAIN: PUSH AR7 

LDI whole_bit_time, AR7 

STI AR7 , @timerl_period 

LDI timer l_vec tor, AR7 

STI AR7 , ©timer l_int_vect 

LDI @tx_byte, AR7 

OR OFFOOh, AR7 

STI AR7, @tx_byte 

AND OFBh, IOF 

LDI OAh, AR7 

STI AR7, @tx_counter 

LDI timer_go, AR7 

STI AR7 , @timerl_control 



load timer period 

tx_timer int vector 

mask stop bit to tx_byte 

update tx_byte 

send out ' ' to IOF0 

load counter in memory 

start tx_timer 



154 



POP AR7 
RETS 



; Timer 1 interrupt service routine for byte transmission. 



TX_INT: 


PUSH AR7 




LDI 
DBNZ 


Stx counter, AR7 

— 

AR7, NEXTOUT 


; load in tx_counter from me 
; if tx_counter not zero 


POP 


AR7 




RETI 






NEXT_OUT: 


PUSH AR6 




LDI 


timer_go, AR7 


STI 


AR7 , Stirrer l_control 


; start tx_timer 


LDI 


@tx_byte, AR6 


; load in tx__byte from mem 


RORC 


AR6 


; next bit out is in carry 


BNC 


OUT_ZERO 


; carry=0, then send out '0' 


OR 


04h, IOF 


; send out '1' to IOF0 


BR 


CLEANUP3 




OUT_ZERO: 


AND OFBh, IOF 


; send out '0' to IOF0 


CLEANUP3: 


STI AR6 , @tx_byte 


; update byte in memory 


STI 


AR7 , @tx_counter 


; update counter in memory 


POP 


AR6 




POP 


AR7 




RETI 
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TMS320 DSP 

designer's 
Notebook 



ffr Texas 
Instruments 



Designing Macros for the TMS320C5x 



Contributed by Jay Reimer 



Design Problem 



What are the tricks to creating a macro that supports all the normal addressing 
modes for its parameter fields? 

How can I be sure that I'm using the correct instruction (CRGT/CRLT) when I 
want to do apply a lower/upper limit to a variable? 

The TMS320 Assembler includes a powerful macro capability. It provides an effec- 
tive way to generate a macro function that appears to be a normal instruction, 
including support of all the normal addressing modes. An example illustrating a 
lower-limit macro is used to demonstrate the capability. The lower-limit macro has 
the characteristics typical of many of the native 'C5x instructions. 



llimit 



.macro limit, shift, nextar 
.nolist 



File 



/ FILE INFORMATION // 

/ // 
/(C) Copyright 1993 Texas Instruments. All rights reserved.// 
/ Use of copyright notice is precautionary and does imply // 
/ publication. // 
/ // 
/============================================================ 

/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 



: llimit. asm 

ts : Perform a lower limit test. The result will be 
in both the accumulator and the accumulator 
buffer on exit. 

The accumulator buffer must contain the value to 
be tested upon entry. A total of three 
parameters may be passed to llimit. These 
parameters must satisfy the syntax for the 
lacc/lacl instructions. The limit parameter may 
be an inmediate value or an address in data 
memory containing the value. If the limit 
parameter is provided as a direct address or as 
an indirect address, the data page pointer or the 
ARP, respectively, must be properly set on entry. 
In any case, the limit will be loaded into the 
accumulator and accumulator buffer compared. 



Figure 1: llimit. asm 
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The greater of the two values, either the original or the lower limit, 
will be loaded into both the accumulator and the accumulator buffer. 



Usage 



: direct: 
indirect: 
long immed: 
short immed 



[label] 
[label] 
[label] 
[label] 



LLIMIT dma [, shift] 

LLIMIT {ind} [, shift [.nextARP]] 

LLIMIT #lk [, shift] 

LLIMIT #k 



;/ History : 12/16/93 - created - Jay Reimer 



create macro substitution symbols and assign them components of the limit parameter 
passed to the macro 

.var tmpl,tmp2,len,shft,nxt 
copy the first character of limit to tmpl 

.asg : limit (1) :, tmpl 
get the length of the limit string 

.asg $symlen(limit) , len 
copy the remaining characters of limit to tmp2 

.asg : limit (2 , len) : , tmp2 
if nextar is a non-zero length string 

.if ( $symlen (nextar) ) 

copy nextar to nxt with a leading comma 

. asg " , : nextar : " , nxt 

.endif 

if shift is a non-zero length string 

.if ($symlen (shift) ) 

copy it to shift with a leading comma 

. asg ■ , : shift : ■ , shf t 
otherwise, if nxt is a non-zero length string 

.elseif ($symlen(nxt) ) 
add a leading " , " to nxt 

.asg ", 0:nxt: ",nxt 

. endif 

generate the appropriate load accumulator instruction (lacc|lacl) followed by the 
compare instruction (crgt) 

use direct addressing if limit is a symbol 



.list 

lace : limit : : shf t : 
.nolist 

use indirect addressing if the first character of limit is **" 
.elseif ($symcmp(tmpl, "*")==0) 
.list 

lace :limit: :shft: :nxt: 
.nolist 

; use immedate addressing if the first character of limit is •#* 
if the value is 0-255 use short immediate addressing 
.elseif ( ($symcmp(tmpl, "#")==0)&( (tmp2&0f fh) ==tmp2) ) 
.list 

lacl : limit: 
.nolist 

otherwise, use long immediate addressing 
.elseif ($symcmp(tmpl, "#")==0) 
-list 

lace : limit: : shf t: 



.if 



($isname (limit) ) 



Figure 1: liimit.asm file listing (continued) 
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.nolist 

.else 

.list 

.emsg "ERROR - invalid macro parameter to llimit. 

.mexit 

.endif 

.list 

crgt 

.endm 



Figure 1: liijiiit.asm file listing (continued) 
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TMS320 DSP 

designer's 
Notebook 



Number 60 



Texas 
Instruments 



ig Status and Control Fields and I/O Ports in the 
TMS320CXX HLL 

Contributed by Jay Reimer 



Design Problem How can I observe the individual fields in the status registers in a convenient way? 

As an example, I want to observe the present or past auxiliary register pointer 
( ARP) of the TMS320C5x when I stop at a breakpoint or single step. 

How can I read and write the I/O ports on the 'C5x while using the HLL debug- 
ger without writing 'C5x assembly code? 

Solution The ARP along with the data page pointer (DP), carry bit (C), and a variety of 
other fields are located in the status registers of the 'C5x. These registers are dis- 
played in the CPU register window of the HLL debugger but it is inconvenient to 
observe the value of any one individual field. 

The HLL debugger has two features that can help display the desired field(s) in 
an easy-to-observe manner. These two features are the ALIAS command and the 
WATCH window. 

The ALIAS command and the WATCH window are also useful in reading and 
writing I/O ports without having to write 'C5x assembly code. This can be especially 
useful during debug when you want to observe an input port value at some arbitrary 
point in your code or you need to write a new value to an output port. 

The examples in this report provide a list of alias commands that can be useful in 
displaying the fields of interest and alias commands that are useful in manipulating 
I/O ports. These examples are specific to the 'C5x, but can easily be adapted for any 
of the other generations of TMS320 DSPs. 

Since the debugger does not support access of status register fields natively, it re- 
quires writing an expression. To avoid having to rewrite these expressions with each 
debug session, the aliases were written and stored in a file "regs . cmd" . If you add 

take regs . cmd, 



to your evminit . cmd file, these aliases will be loaded each time you invoke the 
debugger. (The ,0 suppresses echo of the aliases in the command window during 
load.) 
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You use these alias commands in the following ways: 



wa-DP 
wa-ARP 

?-DP 
?-XF 

e-ARP 4 
e-TC 1 



add watch for DP 
add watch for ARP 

echo DP value in conmand window 
echo XF value in conmand window 

change ARP value to 4 
change TC value to 1 





Status Register (STO) Fields 



alias wa-DP, 
alias wa-INTM, 
alias wa-OVM, 
alias wa-OV, 
alias wa-ARP, 



"wa STOiOxlff ,DP" 
-wa (ST0»9)&1,INTM- 
"wa (ST0»ll)&l,OVM- 
"wa (ST0»12)Sd,OV- 
"wa (ST0»13)&7,ARP" 



alias 


?-DP, 


*? 


STO&Oxlff* 


alias 


?-INTM, 


*? 


(ST0»9)&1* 


alias 


?-OVM, 


*? 


(ST0»ll)Sil" 


alias 


?-OV, 


"? 


(ST0»12)&1* 


alias 


?-ARP, 


"? 


(ST0»13)&7" 



alias 


e-DP, 


"e 


STO = 


alias 


e-INTM, 


"e 


STO = 


alias 


e-OVM, 


"e 


STO = 


alias 


e-OV, 


"e 


STO = 


alias 


e-ARP, 


"e 


STO = 



(ST0&0xfe00)+(%l & Oxlff)- 
(STO&Oxfdff ) + ( (%1 & 1)«9)" 
(ST0&0xf7ff) + ( (%1 & 1)«11)- 
(ST0&0xefff) + ( (%1 S. 1)«12)' 
(ST0&0xlfff) + ( (%1 & 7)«13)" 



Status Register 1 (ST1) Fields 



alias wa-PM, 
alias wa-XF, 
alias wa-HM, 
alias wa-C, 
alias wa-SXM, 
alias wa-TC, 
alias wa-CNF, 
alias wa-ARB, 



"wa ST1&3,PM" 
-wa (ST1»4)&1,XF" 
"wa (ST1»6)&1,HM- 
"wa (ST1»9)&1,C" 
"wa (ST1»10)&1,SXM" 
"wa (ST1»11)&1,TC" 
"wa (ST1»12)&1,CNF" 
"wa (ST1»13)&7,ARB" 



alias 


?-PM, 


"? 


ST1&3 " 


alias 


?-XF, 


"? 


(ST1»4)&1" 


alias 


?-HM, 


«? 


(ST1»6)&1" 


alias 


?-c, 


•? 


(ST1»9)&1" 


alias 


?-SXM, 


"? 


(ST1»10)&1" 


alias 


?-TC, 


'? 


<ST1»11)&1" 


alias 


?-CNF, 


"? 


(STl»12)£tl- 


alias 


?-ARB, 


"? 


(ST1»13)&7- 



alias 


e-PM, 


"e 


ST1 




(STl&Oxfffc) 


+ 


%1 & 


3- 


alias 


e-XF, 


"e 


ST1 




(STliOxffef) 


+ 


((%1 


& 1)«4)- 


alias 


e-HM, 


"e 


ST1 




(STl&Oxffbf) 


+ 


((%1 


& 1)«6)- 


alias 


e-C, 


"e 


ST1 




(STl&Oxfdff) 


+ 


((%1 


& 1)«9)- 


alias 


e-SXM, 


"e 


ST1 




(STl&Oxfbff) 


+ 


((%1 


& 1)«10)" 


alias 


e-TC, 


"e 


ST1 




(STlS=0xf7ff) 


+ 


((%1 


& 1)«11)- 


alias 


e-CNF, 


"e 


ST1 




(STl&Oxefff) 


+ 


((%1 


& 1)«12)- 


alias 


e-ARB, 


"e 


ST1 




(STl&Oxlfff) 


+ 


((%1 


Se 7)«13) " 



Figure 1: regs.cmd file listing 



. Processor Mode Status Register (FMST) Fields 



alias 


wa-BRAF, 


"wa 


PMST&l , BRAF" 


alias 


wa-TRM, 


"wa 


(PMST»1)&1,TRM" 


alias 


wa-NDX, 


"wa 


(PMST»2)&1,NDX" 


alias 


wa-MPMC, 


"wa 


( PMST»3 ) & 1 , MPMC " 


alias 


wa-RAM, 


"wa 


(PMST»4)&1,RAM" 


alias 


wa-OVLY, 


"wa 


(PMST»5)&1,0VLY" 


alias 


wa-AVIS, 


"wa 


(PMST»7)&1,AVIS" 


alias 


wa-IPTR, 


"wa 


(PMST»11) fcOxlf , IPTR" 



alias ?-BRAF, 
alias ?-TRM, 
alias ?-NDX, 
alias ?-MPMC, 
alias ?-RAM, 
alias 7-OVLY, 
alias ?-AVIS, 
alias ?-IPTR, 



"? PMST&l" 

"? (PMST»1)&1" 

*? (PMST»2)&1" 

■? (PMST»3)&1» 

"? (EMST»4)&1" 

"? (FMST»5)&1" 

"? (PMST»7)&1« 

"? (PMST»ll)&Oxlf" 



alias 


e-BRAF, 


"e 


alias 


e-TRM, 


"e 


alias 


e-NDX, 


"e 


alias 


e-MFMC, 


"e 


alias 


e-RAM, 


"e 


alias 


e-OVLY, 


"e 


alias 


e-AVIS, 


"e 


alias 


e-IPTR, 


"e 



PMST= (PMST&Oxf ffe) 
PMST= (PMST&Oxf ffd) 
PMST= (PMSTScOxf f fb) 
PMST= (PMST&Oxf f f 7 ) 
PMST= (PMST&Oxf fef) 
PMST= (PMST&Oxf fdf) 
PMST= (PMST&Oxf f7f) 
PMST=(PMST&0x07ff) 



+ %1 & 1- 

+ ((%1 & 1)«1)« 

+ ((%1 & 1)«2)' 

+ ((%1 & 1)«3)" 

+ ((%1 & 1)«4)" 

+ ((%1 & 1)«5)" 

+ ((%1 & 1)«7>" 

+ ((%1 & Oxlf)«ll)" 



; Circular Buffer Control Register (CBCR) Fields 



alias 


wa-CARl, 


"wa CBCR&7,CAR1" 


alias 


wa-CENBl , 


"wa (CBCR>3)&1,CEMB1" 


alias 


W3-CAR2, 


"wa (CBCR>4)&7,CAR2" 


alias 


wa-CEMB2 , 


"wa (CBCR>7)&1,CENB2" 


alias 


7-CAR1, 


"? CBCR&7" 


alias 


7-CENB1, 


"? (CBCR»3)&1- 


alias 


7-CAR2, 


"? (CBCR»4)&7" 


alias 


?-CENB2, 


"? (CBCR»7)&1- 



alias e-CARl, "e CBCR= (CBCR&Oxf f f8) + %1 & 7" 

alias e-CENBl, "e CBCR= (CBCR&Oxf ff7) + ( (%1 & 1)«3)" 

alias e-CAR2, "e CBCR= (CBCR&Oxf f8f) + ( (%1 & 7)«4)« 

alias e-CENB2 , "e CBCR= (CBCR&Oxf f7f) + ( (%1 & 1)«7)' 



Figure 1: regs.cmd file listing (continued) 



Since the debugger does not support access of I/O ports natively, it requires writ- 
ing an expression. To avoid having to rewrite these expressions with each debug ses- 
sion I created aliases and stored them in a file "io.cmd" . If you add 

take io.cmd, 

to your evminit . cmd file these aliases will be loaded each time you invoke the de- 
bugger. (The ,0 suppresses echo of the aliases in the command window during load.) 
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You use these alias commands in the following ways: 



IN ; reads I/O port 

IN Oxc ; reads I/O port Oxc (12) 

OUT 5,0xfe ; writes Oxfe to I/O port 5 

OUT Oxf,0 j writes to I/O port Oxf (15) 



alias 


IN, 


"? *%l@io" 


alias 


OUT, 


"e *%l@io = %2" 



Figure 2: io.cmd file listing 



